0x4007 commented 11 months ago

Overview

We need a robust schema to manage settlements, which include both credits and debits in our payments ledger.
Every table should start with the following columns: id, created, updated and have foreign keys in the last columns.
We also should split off the permits into a separate table, and associate them with credits as needed.
- Perhaps in the future we can allow for credits without permits.
The bot is expected to be stable after these changes are implemented which implies that all of the functions that interact with the database should also be refactored to accomodate this new architecture.

Schema

my_modified_backup.sql.zip

supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (1)


access
Access control now has a column for JSON arrays, so that any project can set access control for fully customizable labels.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


user_id
integer NOT NULL


location_id
integer NOT NULL


multiplier
smallint DEFAULT '1'::smallint NOT NULL


labels
json



credits
Permits are optional. Perhaps we can assume XP calculations for credits without permits. Alternatively we need to think about other ways to compensate. For Gnosis Safe "permits", what data can do store instead of permits? rfc @rndquu



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


amount
numeric NOT NULL


permit_id
integer



debits
No remarks



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric NOT NULL


location_id
integer



location
Represents where the event occurred. org/repo/issue/comment are all optional fields, because I anticipate that we will track only the most granular level of detail. For example, if we are tracking a comment, we will not need to track the issue or repository.
We are using node_ids because GitHub does not allow direct lookup of resources using numerical IDs. Instead, we would have to "search" for those results.
I'm unsure if it makes the most sense to consolidate to a single column for the node_id or if we should keep them separate. I'm leaning towards keeping them separate, because it will be easier to query for specific events. For example, if we want to see all events that occurred on a specific repository, we can query for node_id_repository.
I included a URL for easy auditing while we are in development.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


node_id_organization
text


node_id_repository
text


node_id_issue
text


node_id_comment
text


url
text



logs
I did not put significant thought into this, but I figured we could use it for generic remarks/comments in the database and dump everything into log_entry.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


log_entry
text


location_id
integer



partners
location_id is intended to save the partner organization.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer


location_id
integer NOT NULL



permits
Foreign keys to recognized tokens, partners, and users (beneficiary)
Just realized but not sure if I need to store the network for the transactions as well or if we should derive from token_id. 



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric


nonce
numeric NOT NULL


deadline
numeric NOT NULL


signature
character(132) NOT NULL


token_id
integer


partner_id
integer


beneficiary_id
integer NOT NULL


transaction
character(66)


location_id
integer



tokens
Tokens take a lot of space. Let's make a recognized table of tokens. Perhaps we should also store the token symbol since they are unlikely to change.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


network
smallint NOT NULL


address
character(42) NOT NULL



users
I am debating on if we should store all their GitHub data. We don't have any planned use for it (other than generating embeddings for their solved issues) but even that isn't included in the original user schema which just had their profile information.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer



wallets
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


address
character(42)


CONSTRAINT
wallets_address_check CHECK ((length(address) = 42))



settlements
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


user_id
integer NOT NULL


location_id
integer


credit_id
integer


debit_id
integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
user_id	integer NOT NULL
location_id	integer NOT NULL
multiplier	smallint DEFAULT '1'::smallint NOT NULL
labels	json

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
amount	numeric NOT NULL
permit_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
amount	numeric NOT NULL
location_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
node_id_organization	text
node_id_repository	text
node_id_issue	text
node_id_comment	text
url	text

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
log_entry	text
location_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
wallet_id	integer
location_id	integer NOT NULL

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
amount	numeric
nonce	numeric NOT NULL
deadline	numeric NOT NULL
signature	character(132) NOT NULL
token_id	integer
partner_id	integer
beneficiary_id	integer NOT NULL
transaction	character(66)
location_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
network	smallint NOT NULL
address	character(42) NOT NULL

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
wallet_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
address	character(42)
CONSTRAINT	wallets_address_check CHECK ((length(address) = 42))

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
user_id	integer NOT NULL
location_id	integer
credit_id	integer
debit_id	integer

Old Spec (For Reference)

Tables

I think it might make sense to do them in reverse order. Here I listed parents then children, but creating the children tables in order to link to the parent tables might be easiest for implementation.

[x] #808
[x] #809
[x] #810
[x] #811
[ ] #819

Implementation Notes

The Credits table now has a foreign key permitId that can link to the Permits table, allowing you to associate a credit with a permit if necessary. However, this is optional, so you can have credits without permits as well.
Set up foreign key constraints for creditId and debitId in the Settlements table.
Application logic should prevent a Settlements entry from having both a creditId and a debitId.
Adjust data types like BigInteger or Decimal for large amounts if needed.

0x4007 commented 11 months ago

Debating on this but still thinking about when we eventually make a command to allow repository maintainers to override the contributor's XP: we will need to create a new column in the settlements table. xpId or something. It will be a foreign key to an "XP Modifiers" table. There we can store all the details around that modification. For example: id, created time, updated time, amount.

We can exclude modifiers from the global leaderboard calculation (globalXP) so that its more difficult to exploit (i.e. collusion between an org and a contributor). We could even make a whistleblower incentive to make sure that people police each other for exploiting the XP system possibly.

Since it would be available in the settlements table, that means the penalties parent table can associate the user, orgId, repoId, issueId and commentURL to see exactly what happened.

wannacfuture commented 11 months ago

/start

ubiquibot[bot] commented 11 months ago

Too many assigned issues, you have reached your max of 2

seprintour commented 11 months ago

/start

ubiquibot[bot] commented 11 months ago



  
    
    
  
  
    Deadline
    Fri, 22 Sep 2023 05:23:00 UTC
  
  
    Registered Wallet
    0x3623338046b101ecEc741De9C3594CC2176f39E5

Tips:

Use /wallet 0x0000...0000 if you want to update your registered payment wallet address @user.
Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the bounty.

seprintour commented 11 months ago

@pavlovcik There was already a Permit table, guess we should use that instead..

It has all of the fields here and more

0x4007 commented 11 months ago

@pavlovcik There was already a Permit table, guess we should use that instead..

It has all of the fields here and more

Thanks for highlighting this. The only concern I have with the current implementation is that it has an issue ID associated with every permit, but I want to allow the flexibility for us to issue permits without necessarily associating them to issue IDs.

I believe that my specification allows for more flexibility, but I also do not claim to be a database architecture expert.

I also just realized that ChatGPT jumbled up some of the rows. It left some permit specific data inside of the credits table.

seprintour commented 11 months ago

I also just realized that ChatGPT jumbled up some of the rows. It left some permit specific data inside of the credits table.

Yea, network, signature and owner is permit data.

Previous permit table contains repository data and issue id to pinpoint the exact origin of the permit. Are you saying we don't need that anymore.

Because none of the tables here refers to a repository (origin) of the payment

seprintour commented 11 months ago

Rfc

0x4007 commented 11 months ago

The parent table does. https://github.com/ubiquity/ubiquibot/issues/767

seprintour commented 11 months ago

The parent table does. #767

Okay then, this issue entail just creating the tables, moving permit data to the new table. Is creating functions to add these data to DB included? @pavlovcik

0x4007 commented 11 months ago

The bot should be stable after the database is changed out so yes it includes modifying all the functions that interact with the database.

0x4007 commented 11 months ago

/ask how long do you estimate that this task would take, in hours?

ubiquibot[bot] commented 11 months ago

! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

0x4007 commented 11 months ago

! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

@Keyrxng time for compression/prioritization? Not a great first real world attempt lol.

Prioritization order:

Current issue specification
Linked issue specification (in order of linked, the first link taking higher priority than the next link)
Current issue conversation
Linked issue conversations (same ordering system)

We should use a tokenization estimator to know how much we should exclude.

seprintour commented 11 months ago

@pavlovcik I'm adding penalties to this too?

I see the penalty issue was rolled onto this

0x4007 commented 11 months ago

Most likely but let me just get a little time to refine this spec. Sorry I realized after filing the issue that there were some mistakes in my specification. I might break this up into separate tasks.

0x4007 commented 11 months ago

/ask how long do you estimate that this task would take, in hours?

ubiquibot[bot] commented 11 months ago

! Error: This model's maximum context length is 16385 tokens. However, you requested 21257 tokens (4873 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

0x4007 commented 11 months ago

@seprintour I broke it down into four subtasks

seprintour commented 11 months ago

@seprintour I broke it down into four subtasks

Okay, credit depends on permit (because permit is going to be modified and it needs to be linked to the permit id after modification)

Settlements depends on credit and debit (If its not merged in this order, the supabase migration will fail to run automatically)

0x4007 commented 11 months ago

I noted that as well on the Tables section.

seprintour commented 11 months ago

Debits Table Schema #809

I also think the amount for debit and credit should be a float or big integer if we are converting decimal to bigint because permit amount is not always in whole numbers

0x4007 commented 11 months ago

Lets chat under that issue instead.

Keyrxng commented 11 months ago

4023 in the messages, 16384 in the completion

Same as other shout out, you've set the output to be the maximum token amount for that model. max_tokens represents output so you gotta drop the token limit to half or 2/3.

the actual context fed into GPT was only 4k tokens

0x4007 commented 11 months ago

I set to 8000 max tokens in the org config and it silent crashes

0x4007 commented 11 months ago

supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (1)

just realized that ids are useless. need node_ids and graphql api

0x4007 commented 11 months ago

0x4007 commented 11 months ago

I finished the schema but i still need to verify the node id situation. I might need to change the ids under 'location' table to track references on github e.g. org/repo/issue/comment ids final-database.zip

0x4007 commented 11 months ago


access
Access control now has a column for JSON arrays, so that any project can set access control for fully customizable labels.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


user_id
integer NOT NULL


location_id
integer NOT NULL


multiplier
smallint DEFAULT '1'::smallint NOT NULL


labels
json



credits
Permits are optional. Perhaps we can assume XP calculations for credits without permits. Alternatively we need to think about other ways to compensate. For Gnosis Safe "permits", what data can do store instead of permits? rfc @rndquu



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


amount
numeric NOT NULL


permit_id
integer



debits
No remarks



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric NOT NULL


location_id
integer



location
Represents where the event occurred. org/repo/issue/comment are all optional fields, because I anticipate that we will track only the most granular level of detail. For example, if we are tracking a comment, we will not need to track the issue or repository.
We are using node_ids because GitHub does not allow direct lookup of resources using numerical IDs. Instead, we would have to "search" for those results.
I'm unsure if it makes the most sense to consolidate to a single column for the node_id or if we should keep them separate. I'm leaning towards keeping them separate, because it will be easier to query for specific events. For example, if we want to see all events that occurred on a specific repository, we can query for node_id_repository.
I included a URL for easy auditing while we are in development.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


node_id_organization
text


node_id_repository
text


node_id_issue
text


node_id_comment
text


url
text



logs
I did not put significant thought into this, but I figured we could use it for generic remarks/comments in the database and dump everything into log_entry.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


log_entry
text


location_id
integer



partners
location_id is intended to save the partner organization.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer


location_id
integer NOT NULL



permits
Foreign keys to recognized tokens, partners, and users (beneficiary)



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


amount
numeric


nonce
numeric NOT NULL


deadline
numeric NOT NULL


signature
character(132) NOT NULL


token_id
integer


partner_id
integer


beneficiary_id
integer NOT NULL


transaction
character(66)


location_id
integer



tokens
Tokens take a lot of space. Let's make a recognized table of tokens. Perhaps we should also store the token symbol since they are unlikely to change.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


network
smallint NOT NULL


address
character(42) NOT NULL



users
I am debating on if we should store all their GitHub data. We don't have any planned use for it (other than generating embeddings for their solved issues) but even that isn't included in the original user schema which just had their profile information.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


wallet_id
integer



wallets
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT now() NOT NULL


updated
timestamp with time zone DEFAULT now() NOT NULL


address
character(42)


CONSTRAINT
wallets_address_check CHECK ((length(address) = 42))



settlements
No remarks.



Column Name
Type/Constraints




id
integer NOT NULL


created
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


updated
timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL


user_id
integer NOT NULL


location_id
integer


credit_id
integer


debit_id
integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
user_id	integer NOT NULL
location_id	integer NOT NULL
multiplier	smallint DEFAULT '1'::smallint NOT NULL
labels	json

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
amount	numeric NOT NULL
permit_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
amount	numeric NOT NULL
location_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
node_id_organization	text
node_id_repository	text
node_id_issue	text
node_id_comment	text
url	text

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
log_entry	text
location_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
wallet_id	integer
location_id	integer NOT NULL

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
amount	numeric
nonce	numeric NOT NULL
deadline	numeric NOT NULL
signature	character(132) NOT NULL
token_id	integer
partner_id	integer
beneficiary_id	integer NOT NULL
transaction	character(66)
location_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
network	smallint NOT NULL
address	character(42) NOT NULL

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
wallet_id	integer

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT now() NOT NULL
updated	timestamp with time zone DEFAULT now() NOT NULL
address	character(42)
CONSTRAINT	wallets_address_check CHECK ((length(address) = 42))

Column Name	Type/Constraints
id	integer NOT NULL
created	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated	timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
user_id	integer NOT NULL
location_id	integer
credit_id	integer
debit_id	integer

supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (1)

0x4007 commented 11 months ago

supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (2) 1695844219.zip

ubiquity / ubiquibot

Database Schema Design: October 2023 #787

Overview

Schema

access

credits

debits

location

logs

partners

permits

tokens

users

wallets

settlements

Old Spec (For Reference)

Tables

Implementation Notes

Tips:

access

credits

debits

location

logs

partners

permits

tokens

users

wallets

settlements


Deadline	Fri, 22 Sep 2023 05:23:00 UTC
Registered Wallet	0x3623338046b101ecEc741De9C3594CC2176f39E5