jbloomlab / SARS2-mut-fitness

Observed substitution counts of SARS-CoV-2 compared to those expected under the mutation rates
MIT License
19 stars 5 forks source link

Export fitness as a `JSON` file #38

Closed WillHannon-MCB closed 1 year ago

WillHannon-MCB commented 1 year ago

This pull request introduces a new rule to the pipeline that enables the storage of amino acid fitness estimates in a JSON file. It also includes both metadata and summary statistics for fitness at each site. The intent behind this addition is to provide a data format that seamlessly integrates with the Chodera lab's interactive protein visualization tool.

The stored amino acid fitness includes a filter to exclude any mutations where the minimum expected count falls below the specified minimum expected count as defined in the config file. Also, stop codons are omitted from the calculations of summary statistics at each site.

WillHannon-MCB commented 1 year ago

@jbloom, the un-compressed file is ~25M, so pretty big. The compressed file is ~1.4M.

jbloom commented 1 year ago

That's not too huge. Can you write file as compressed and then track those? I think this is best approach, and let me know when that is done and I will approve request.

WillHannon-MCB commented 1 year ago

Alright, I've gzipped the files and added them to git lfs.