isolate and isolate_mutation tables don't support NGS and ENA sequence

hivdb / covid-drdb

MIT License

2 stars 0 forks source link

isolate and isolate_mutation tables don't support NGS and ENA sequence #15

Closed KaimingTao closed 2 years ago

KaimingTao commented 2 years ago

For example Williamson21

philiptzou commented 2 years ago

Are you trying to record the accessions from SRA/ENA? If so just use the genbank_accession column. The column is currently not used by any program but only for us to link the original sequences.

KaimingTao commented 2 years ago

A similar issue is two or more sequences from the same day, extracted from different places. For example, Williamson21 Day 155.

KaimingTao commented 2 years ago

For Lohr21, it didn't specify the sequence but provided the prevalence of mutations from one day. Currently, I can do two things, 1) separate different mutations to different days to pass the validation, or 2) combine all the mutations into one iso_name.

KaimingTao commented 2 years ago

After discussion, here are some conclusions.

1) for NGS mutations, save count and total to indicate the percentage. 2) for sequences from different places of the body, merge the mutations and use the consensus mutation list.

philiptzou commented 2 years ago

This rule should be extended to samples collected from the same person on the same day. @KaimingTao