qbic-pipelines / metapep

From metagenomes to epitopes and beyond
MIT License
3 stars 1 forks source link

Data model - database #18

Open skrakau opened 3 years ago

lkuchenb commented 3 years ago

Current draft proposal

metapep

skrakau commented 3 years ago

prediction_id is not needed ...

skrakau commented 3 years ago

Regarding the size of the peptide tables and memory, for my current datasets:

# proteins: 1,855,616

# non-unique peptides (9mers) across proteins (multiple occurrences within one protein not counted!): 552,451,599

# unique peptides:
392,722,935

Peak memory for generate_peptides: peak_vmem=176,900,708

lkuchenb commented 3 years ago

New model containing entities as an additional link between microbiomes and proteins, modelling the linking entity aka taxa, MAGs/bins or assembly contigs

metapep

New color coding:

Orange -> provided or pre-computed entities Gray -> associations Purple -> Pipeline output

skrakau commented 3 years ago

protein_orig_id ist missing