Collections issue : Trying to query across entities using collections but getting error

apoorv-yadav commented 3 months ago

Describe the bug Trying to use collection feature to query across 2 separate entities (employee and tasks) for fetching all tasks associated with employee, along with employee details. But Its not working since I have mapped PK and SK with attributes.

ElectroDB Version : 2.14.2

ElectroDB Playground Link Playground link : https://electrodb.fun/?#code/PQKgBAsg9gJgpgGzARwK5wE4Es4GcA0YuccYGeqCALgUQBYCG5YA7llXWAGbZwB2MXGBDAAUKKwBbAA5QMVMAG8wAUT5V2AT0IBlTADcsAY1IBfbhiiSwAIkRwjVSzABGNgNzijUPrgVUGFwRSAF5bbxkGPk0PLx8-MDgZBChNEjAwvjgWVXUtAApRMCUi4rBJWEQALhKysv4NKk0au2TUkht8Urr9TFwsHxaARk7usuIMQxMWgNwAawZpaRsx0y66hionLBdUKjwaxTH6trS4AEkYQ+O6sqbpOBqAcj9sPgBzJ-Xbn64cBCuYCeAAUANJfG7FNaQsCoCaSOABa4-H73R5A15YD4QlG3P6IQFPHTg74-aF1cllLHwAAeB1qt2IWyx71wyNxZW8CGCjgGfBmDHmuFGHLK0jm7NFdXxAOeYJxUrqEVk-X2NQA2k8ktIUmdLk8ALqkxVgfbJTborWnEgAYgAJIptbqSJdTAqTWAjIKWc8+D44BCYRTjbj5pKTTLCcT3SblVBVejNXDMAiAoaQ1KzTqLc9UwwHcmMHm3RnRV7+h9ff6Y7jKWTjaZSpTlAEgmZRABKTyiby+fyCuYZMBZHJqRqaQrFI51CrwBDhn4NLTPWZzGu3XoYfqDIFDdd1CZTS2rgCCSz0k2MAYzdbKm22u32bIZuKd7QugOnJrRz0x2NLKKRnKJJBmAt6ogOlwLhyP4Yts-6gdK-xRiBorgbcwDAGAADyHCYGA97YI+eA3Oh1JwHSz5friLiaFBL5SlyPIaDuNirsKAE-OK0GKkBQLypxHJxgmGpWjq776kaiEolmCA5kCb5nPajrWh+JbST85Y+kCfpZPuKLoaGEoMR6fFEqhHqclYKrsImTyrpJgm4rJ8n2QOykOTA6mWZy3qVjp1ZOWUhnBTe3zNqagTBGBnbdr2CSLNIQ4jmAF5HvkyiKSQhCrmBXZxH2iSqZcQ42DAAB0dCoJofB0LEoiYbCxAEZ6UDcg4LF8KaUAoOgGCaOUcikBwURgP6iR5E0BEKAwppSHAoiJeVTEdXyuClOV7EZUV4l6jAeUbe8UAZdIDDvPSTwMNyTymPlohAA

Entity/Service Definitions

const employee = new Entity(
  {
    model: {
      entity: "employee",
      version: "1",
      service: "taskapp"
    },
    attributes: {
       employeeId: {
            type: 'string',
            field: 'PK',
        },
        usermeta: {
            type: 'string',
            field: 'SK',
        },
    },
    indexes: {
      settings: {
            collection: "tasks",
            pk: {
                field: 'PK',
                composite: ['employeeId'],
                template: 'employee#${employeeId}',
                casing: 'none',

            },
            sk: {
                field: 'SK',
                composite: ['usermeta'],
                template: 'meta${usermeta}',
                casing: 'none',
            },
        },
    }
  },
  { table }
);

const task = new Entity(
  {
    model: {
          entity: 'task',
          version: '1',
          service: 'taskAppService',
      },
      attributes: {
          employeeId: {
              type: 'string',
              field: 'PK',
          },
          taskId: {
              type: 'string',
              field: 'SK',
          },
          // Other attributes
      },
      indexes: {
          byId: {
              collection: "tasks",
              pk: {
                  field: 'PK',
                  composite: ['employeeId'],
                  template: 'employee#${employeeId}',
                  casing: 'none',
              },
              sk: {
                  field: 'SK',
                  composite: ['taskId'],
                  template: 'task#${taskId}',
                  casing: 'none',
              },
          },
      },
  },
  { table }
);

const app = new Service({ employee, task });
const employeeId = "1233321";

// use a collection to query more than one entity at a time
app.collections
  .tasks({ employeeId })
  .go({pages: 'all'});

Expected behavior It should return all tasks assigned to employee, along with employee all details (i.e employee metadata)

Errors

Invalid use of a collection on index "(Primary Index)". The pk field "PK" shares a field name with an attribute defined on the Entity, and therefore the index is not allowed to participate in a Collection. Please either change the field name of the attribute, or remove all collection(s) from the index.

tywalch commented 3 months ago

It depends on what you're trying to do. Are you trying to use ElectroDB with existing data? I would make different recommendations depending on if this is a greenfield use case or trying to match an existing model.

rdzidziguri commented 3 months ago

I have a similar situation, but in my case, it is not a greenfield project, so I am limited in schema shape; also, as my team learned, one needs to be prepared that as soon as you update even a single attribute, ElectroDB is going to add **__edb_*** attributes. However, I assume that this can not be avoided; however, some schemas already have patterns to track versions and entity types (it would be nice if those properties could be remapped to existing attributes).

I would say that for one, using the library for the first time, At the same time, it works nicely with small to medium schemas if you have table schema and 3 GSI s along with 3 LSI, and for the sake of example, let's say one LSI is a location, where it is like <country>#<State>#<city>#<zip> not only does it require some advanced getter plus filed overrides but also this type of approach which is standard for DDB becomes exceptionally challenging. It is a great library. Still, with an actual single table schema, as probably any other opinionated library, the limitations imposed by this specific one must be considered before deciding on existing projects.

tywalch commented 3 months ago

There are options to map to a current schema, could you maybe share an existing item and I can try to demonstrate? As for your second point, could you provide an example of the challenge you're experiencing?

rdzidziguri commented 3 months ago

I would instead let the original thread author respond to this (I do not want to take over the issue :) ), but in general, a considerable table (and here I am talking about millions of entries ) in a single table schema would be a prevalent practice to design the following way and use keys to store composite query details and use for example Batch get to get parent entities as well as all the child items. To demonstrate the approach, let's take an example of an Airline and flight management system where PK could be the airline company ID, and individual flights would be

PK	SK	meta
AIRLN#	meta	{...}
AIRLN#	FLIGHT#\<date>	{...}
AIRLN#	FLIGHT#\<date>	{...}
AIRLN#	ASSETS#\<asset_id>	{...}

In this case, the access pattern is to get all the upcoming flights and airline information for the given company. Again, I apologize if this is not precisely what the thread author asked. I believe it is close enough, so in this case, one would be issuing Batch get to receive data where SK = meta as well as SK BEGINS WITH FLIGHT#

apoorv-yadav commented 3 months ago

Hey @rdzidziguri perfectly describes my issue here. @tywalch, Its quite similar to my case. I would like to have all the tasks of an employee along with employee details (meta) and I can't change my schema (unfortunately).

BatchGet or collections either of them will work for me.... Looking forwards to your response.

tywalch commented 3 months ago

@rdzidziguri (and @apoorv-yadav)

Are you using the PK field as companyId or do you also have a separate companyId field? Same question for your SK. For example, which item below matches your item?:

Option 1

{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22"
}

Option 2

{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22",
    "compani_ID": "123",
    "date": "2024-07-22"
}

apoorv-yadav commented 3 months ago

Hello @tywalch , in my case its option 1

Option 1

{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22"
}

rdzidziguri commented 3 months ago

It is Option 1 for me as well. Also, let me explain some of the existing arch decisions there. As you know, DDB is not an excellent toy for cost optimization and is even crazy with Global Tables on. Hence, minimizing storage becomes essential at some point, and one has to squish data as much as possible, so, just like @apoorv-yadav, I am also firmly bound to an existing schema. minimizing the number of attributes

makes Global Table replication latency smaller
saves costs
It allows retrieving more data with the same command, considering that 4kb RCU is applied for eventual consistency.

{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22"
}

tywalch commented 3 months ago

Ok, so the attributes involved in your keys don't exist as another item; with your existing implementation, do you "extract" the item from the key value on retrieval? Like do you turn FLIGHT#2024-07-22 into 2024-07-22 or AIRLN#123 into 123?

rdzidziguri commented 3 months ago

In my case, yes, this is how it is supposed to go.

FLIGHT#2024-07-22 into 2024-07-22 and AIRLN#123 into 123

rdzidziguri commented 3 months ago

I assume the feature is not supported, and we will have to look for alternatives, as the issue has been open for a while now. Good luck to the thread author, but we will probably move on to native SDK implementation using DDB low-level queries.

tywalch / electrodb

Collections issue : Trying to query across entities using collections but getting error #407