prisma / prisma

Next-generation ORM for Node.js & TypeScript | PostgreSQL, MySQL, MariaDB, SQL Server, SQLite, MongoDB and CockroachDB
https://www.prisma.io
Apache License 2.0
38.79k stars 1.52k forks source link

Mutli-dimensional arrays #15676

Open SorenHolstHansen opened 1 year ago

SorenHolstHansen commented 1 year ago

Problem

I am missing 2 and 3 dimensional arrays for my current project in MongoDB.

Suggested solution

Enable 2 and 3 dimensional arrays, at least in MongoDB. Preferably this would just be an extension of the current way of denoting arrays, so for instance

model MyModel {
  ...
  some2darray   String[][]
  some3darray   Int[][][]
  ...
}

and have the same limitations as 1 dimensional arrays, i.e. not be optional, and be supported on the same databases that support 1 dimensional arrays, i.e. postgresql, cockroachDB and mongoDB.

Preferrably it should support arrays of types for MongoDB, but that might be another issue.

Alternatively, if the above syntax is not possible, one could denote the arrays like

model MyModel {
  ...
  some2darray   [[String]]
  some3darray  [[[Int]]]
  ...
}
janpio commented 1 year ago

What would that look like in Prisma schema? What does a data example for this look like?

SorenHolstHansen commented 1 year ago

Sorry for the bad description, have updated it now

arthurfiorette commented 1 year ago

Any plans to implement this? Sucks a lot to have to create inner types just to avoid direct 2d arrays relations.


model A {
  arr IntArr[]
}

type IntArr {
  array Int[]
}

// Would be just

model A {
  arr Int[][]
}

If the #3219 gets implemented, probably this should be done as part of it.

sepehr500 commented 1 year ago

We are running into this issue as well, but in posgresql. Would be great to support 2d arrays.

riyavsinha commented 1 year ago

+1. As an example usecase, many genomics "motif" sequences are stored as short multidimensional arrays representing probabilities of different DNA nucleotides at each position.

mwillbanks commented 6 months ago

I'd like to provide a bit more context on the use case here. Typically an array is going to be used to represent a vector of some type. Most commonly, you're going to see this with various metrics, take for example the following example schema:

model SomeMetric {
  someId        String            @id
  type          MetricType
  series        Int[][2]          @default([])
  min           Int               @default(0)
  max           Int               @default(0)
  avg           Float             @default(0)
  sum           Int               @default(0)
  count         Int               @default(0)
  createdAt     DateTime          @default(now())
  updatedAt     DateTime          @updatedAt

  some Some @relation(fields: [someId], references: [someId], onDelete: Cascade)
}

So what does this allow us to do... Now I can store my data in a format of something like: [[EPOCH_TS, VALUE], ...]

Querying this data provides a high level of efficiency, I can then likewise calculate the other columns (either on the fly using generated columns in postgres OR at calculation time). In general, having the count I can easily determine the max, min, avg and sum and just have them happen as part of an update query. Selecting out the data points is also quite simplistic as in:

SELECT 
    type,
    to_timestamp((unnested.series).value[1]::int) AS timeStamp,
    (unnested.series).value[2] AS value
FROM (
    SELECT 
        type,
        unnest(series) AS series(value)
    FROM 
        SomeMetric
) AS unnested;

Now that's mainly a simplistic query but you can actually select specific sets of data, let's say for example I wanted the first 10 records only...

SELECT 
    type,
    to_timestamp((unnested.series).value[1]::int) AS timeStamp,
    (unnested.series).value[2] AS value
FROM (
    SELECT 
        type,
        unnest(series[1:10]) AS series(value)
    FROM 
        SomeMetric
) AS unnested;

Now, the power of this really comes into play in the manipulation of this data by being able to use ARRAY_APPEND and ARRAY_PREPEND, at that point, you can simply add onto the end of the structure or beginning of the structure without having to read the entire thing into memory unlike a json object or json array.

Not having this support makes things substantially more difficult and less effective from a performance standpoint and while we can leverage unsupported, it does create substantial obstacles in that we can't tell prisma how to read that information.