Closed alexeliotlash closed 6 months ago
I don't yet have an answer on this, but I also downloaded syn51703772, and I also couldn't find JP-TRANS-2. So, at least we see the same thing so far :-)
Ok... I have no confirmed that hdash has a meta_cache table, and I see 5 records for syn51703772. So, could be a caching issue.
Hi Alex, this is actually not a caching issue.
Rather, we have two files: syn39282351
and syn51703772
Both files use the same primary HTAN IDs, e.g. both files claim to be annotating HTA10_0000_06037. This is not allowed, and it is causing an error in the validation message, e.g.
HTA10_0000_06037 references parent ID=JP_Desc_3, but no such ID exists. [Error occurred while processing file: syn51703772 of type MassSpectrometryLevel1].
because HTA10_0000_06037 is defined as the primary key in both files, hdash cannot distinguish between the two files. But, if you look at syn39282351, you will see that HTA10_0000_06037 does reference parent ID=JP_Desc_3.
The error message should probably say:
HTA10_0000_06037 references parent ID=JP_Desc_3, but no such ID exists. [Error occurred while processing file: syn39282351 of type OtherAssay].
To fix this issue, I think we have to fix the meta files themselves.
This might be opening a can of worms, but I now added a duplicate Primary ID check.
See: https://hdash.website-us-east-1.linodeobjects.com/HTA10.html
Good news is that we now have the root problem:
Primary ID HTA10_0000_06191 has already been defined in OtherAssay. [Error occurred while processing file: syn51703772 of type MassSpectrometryLevel1].
Bad news is that Stanford has lots of duplicate primary IDs.
To the error message of the duplicate Primary ID check, could you add the filename of the file you're processing? For example, from "Primary ID HTA10_07_00102001 has already been defined in BulkRNA-seqLevel1. [Error occurred while processing file: syn39282161 of type BulkRNA-seqLevel1]." to something like "Primary ID HTA10_07_00102001 has already been defined in file synXXXXXXXX of type BulkRNA-seqLevel1. [Error occurred while processing file: syn39282161 of type BulkRNA-seqLevel1]." ?
@alexeliotlash I added this suggestion, you can see here: https://hdash.website-us-east-1.linodeobjects.com/HTA10.html
are we good to close this issue now?
Looks good. You can close the issue. Thanks.
Closing!! :-)
Seems there are some file error alerts persisting even after they've been corrected by centers. For example, on https://hdash.website-us-east-1.linodeobjects.com/HTA10.html there are 398 validation errors. There are multiple "links connect" errors referring to ID=JP-TRANS-2.
Looking at one of those errors “HTA10_0000_06191 references parent ID=JP-TRANS-2, but no such ID exists. [Error occurred while processing file: syn51703772 of type MassSpectrometryLevel1]." when I freshly download syn51703772 from Synapse I can't find any references JP-TRANS-2.
We thought perhaps hdash is caching files somewhere locally and this cache might need to be cleared before hdash could be run from scratch?