Open theosanderson opened 1 month ago
We would like this field to be the earliest date of:
Do you have an idea how we can implement this? Loculus doesn't know about an INSDC release date and the pipeline cannot possibly know about the Pathoplexus/Loculus release date for the first version as it runs before a sequence is released.
I think this would need to be a built-in field, like the Loculus release date itself. So I guess it would be computed in get-released-data by taking the earliest of these fields. Loculus does know the INSDC release date for ingested sequences which is the only relevant case for this issue.
What do you think about the following idea:
We implement a feature that allows an admin to (optionally) specify a script or Docker image that will be called by the silo_import_job.sh
before starting the SILO preprocessing. For Pathoplexus:
/get-released-data
and SILO preprocessing that modifies the data file and computes the date.It's not the most performant/optimized solution but highly flexible.
Currently on Pathoplexus we default to showing an "NCBI release date" field. This is good because if we used the Pathoplexus release date field all sequences ingested from INSDC would have the same release date which wouldn't be useful. But it's bad because it is undefined for sequences submitted directly to pathoplexus. IMO we should create a consensus field which is the NCBI release date if set (which will generally also be the Pathoplexus release date for items submitted to INSDC after launch) and otherwise the Pathoplexus release date. I'm raising this in Loculus as the code implementation would be here.