instedd / cdx

Connected Diagnostics Platform
https://cdx.io
9 stars 7 forks source link

New fields for Batches entity #1863

Closed leandroradusky closed 1 year ago

leandroradusky commented 1 year ago

The Batch entity within CDx is missing some data to be complete in regard to the data that needs to be uploaded to NIH.

These new columns need to be added to batches:

Field labels:

  1. Reference gene
  2. Target organism taxonomy ID
  3. Pango Lineage
  4. Virus World Health Organization's label

All of them are combo free text fields:

  1. The field should be autocomplete, but allow entering values that aren't in the list.
  2. The list should include all values that were entered in previous batches for the current institution.
  3. The field needs to be in the Batch and in the Sample.
  4. The list of options for Batches and Samples, should be the same.
diegoliberman commented 1 year ago

@leandroradusky could you briefly explain what those fields are about?

leandroradusky commented 1 year ago

I will do it in the main text of the issue when we have the full details... But as an advance: these two mentioned above are names that the viruses within the batches have in global reference databases: WHO_label is the name that the world health organization gives to the virus, similarly, PANGO stands for Phylogenetic Assignment of Named Global Outbreak Lineages, a database with reference names widely used in this scenario.

ysbaddaden commented 1 year ago

Shall the fields be free text, or would we like to have a database of all these names? The former is prone to human typos, while the second means we must keep the databases updated, which we probably don't want to.

diegoliberman commented 1 year ago

@leandroradusky what do you say if we make these fields dropdowns/autocomplete? Plus an "other" option for entering the value manually if not found? Ideally, in the future the system should have some kind of support for adding new options into the tables, but they would also have the "other" option for not getting blocked.
In the future, if they find a frequent need of adding options, a feature could be implemented for administering the values from the UI.

leandroradusky commented 1 year ago

This can be done, we should confirm with them where they get this labels for different DBs and analyze if we can have a table, or if we should be constantly updating them from online resources (taxonomy id, I believe, will update each time there is a new strain of covid or whatever other organism, and they will need to add the options "constantly"─there isn't now one new strain per week, but in the peak of covide there were).

ysbaddaden commented 1 year ago

Just a note that the estimation will be completely different: syncing a local copy of a database is more complex than having free text inputs.

diegoliberman commented 1 year ago

The conclusion regarding the fields to be free text or combos, is described in the section All of them are combo free text fields: of this ticket.

bolom commented 1 year ago

@diegoliberman still need some clarification

The field should be autocomplete, but allow entering values that aren't in the list. The list should include all values that were entered in previous batches for the current institution.

If no values have been previously entered in these fields for batches from the same institution, the autocomplete feature will not have any suggestions to offer.

Conversely, if there have been values entered in these fields for past batches from the same institution, the autocomplete feature will provide these as suggestions when inputting data into these fields.

Thats correct?

The field needs to be in the Batch and in the Sample.

(reference_gene, target_organism_taxonomy_id, pango_lineage, who_label) this new field should be also in Sample model?

The list of options for Batches and Samples, should be the same.

For example, if you have a Batch with a reference_gene of "N", and later you are entering a Sample, "N" should come up as a suggestion for the reference_gene in the Sample as well, and vice versa.

that's correct?

diegoliberman commented 1 year ago

@bolom the answer for the 3 questions is --> correct

:)

sardar-usman commented 1 year ago

@bolom It is working fine.