griffithlab / civic-server

Backend Server for CIViC Project
MIT License
39 stars 32 forks source link

nightly-VariantSummaries: different number of columns? #715

Closed sigven closed 1 year ago

sigven commented 1 year ago

Hi,

Thanks for a great resource. When trying to parse the nightly-VariantSummaries TSV file in R, i get a warning about some 9-10 records containing more columns than expected. I guess this might be caused by a missing value/formatting matter(?), it would either way be convenient to ensure that the number of columns are consistent for all records of this file.

Here is simple command that showcases it:

(base) sigven$ awk 'BEGIN{FS="\t"}{print NF;}' nightly-VariantSummaries-20221118.tsv | sort | uniq -c 1612 29 5 30 2 31 2 33

Thanks in advance,

cheers, Sigve

acoffman commented 1 year ago

Hi @sigven and @pdiakumis

Thanks for bringing this to our attention, and apologies for the delayed response. CIViC development is taking place in the https://github.com/griffithlab/civic-v2/ repo these days, and so I initially missed this report.

We can confirm the issue you're seeing. Its the result of not properly joining the Assertion URL column in our export process. I have a fix ready that will go out in our next deploy which is scheduled for next week. I will follow up here when the fix is live!