human-pangenomics / hprc-data-explorer

2 stars 0 forks source link

Move annotations to their own entity tab #48

Closed NoopDog closed 2 days ago

NoopDog commented 1 week ago

Need

The usage pattern for assemblies and annotations is that they may be accessed separately. We want to make it clear for each type how to access it.

Approach

  1. Create a new annotations source file
  2. Use it to create a new annotations tab
  3. Remove the annotation columns from the assemblies tab

Create a New Annotations source file

With fields:

sample_id haplotype reference annotation_type file_location

We might be able to reuse some of the code that extracted the annotations and joined them to the assemblies.

  1. From the annotation_index folder
  2. Process each file to extract the required fields.
    • Not all files have haplotype or reference let this map to Unspecified in the app, so I think this means leave empty if the column or value is missing.
    • flager gets split into three rows. Use the types as
      • flagger_unreliable_only_no_MT_file_location,
      • flagger_unreliable_only_file_location,
      • flagger_all_file_location
        1. Append all rows to a single annotations file.

Use the file to populate an "Annotations" tab.

  1. Place this after the "Assemblies" tab
  2. Add all columns
  3. Add filters for sample id, Haplotype, reference, annotation type
  4. Make the file_locatoin similar to the others with the copy to clipboard and truncated length.

Remove the annotation information from the Assemblies column

  1. Update the python script to omit the join to the annotations
  2. Remove any annotation rows or columns in the table
NoopDog commented 2 days ago

Complete