Closed corneliusroemer closed 1 month ago
We plan to annotate our ENA depositions with PP metadata right (the PP accession, as some cross-reference?) Can't we use that?
It probably doesn't end up in NCBI virus export so can't do that easily afaict
@anna-parker
Hmm does this mean we could be failing to ingest some other quite important data? (Not a criticism - just for understanding, I guess I previously thought we were capturing 100% of data)
(and it's a genuine question - maybe this would be the only thing that we are not capturing)
Also, we shouldn't submit ingested sequences, so it wouldn't be an infinite loop (but yeah, it would not be good!)
Hmm does this mean we could be failing to ingest some other quite important data? (Not a criticism - just for understanding, I guess I previously thought we were capturing 100% of data) (and it's a genuine question - maybe this would be the only thing that we are not capturing)
@theosanderson Yeah we only ingest whatever shows up in NCBI Virus output. There's a bunch of stuff that doesn't make it through - it all depends on whatever NCBI Virus parses from the genbank records.
See separate issue I just made #2834
We need to find out the nucleotide accessions that correspond to ENA deposited sequences so ingest can ignore these (otherwise we end up with infinite loop).
This might require some changes to ena submission so we find out the nucleotide accessions from the GCA assembly accessions.
For reference:
@anna-parker (this is a test)