m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 1 forks source link

Incomplete data uploaded or displayed in Fragalysis #1266

Closed Waztom closed 6 months ago

Waztom commented 8 months ago

77 datasets were uploaded to Fragalysis V2, 63 can be viewed in Fragalysis.

@kaliif, please can you confirm how many datasets were saved to the DB (Target ID =1). Need to see why datasets are missing and if it is backend or frontend related (possibly needs @boriskovar-m2ms).

kaliif commented 8 months ago

@Waztom 63 site_observation objects (assuming this is what you mean?) for target 1. But investigating this uploads' meta_aligner.yaml file, there are 63 entries where site_observations are extracted from. Are you sure 77 is the correct number?

Waztom commented 8 months ago

@kaliif can confirm there are 78 expected datasets - you can find them in the "upload_1/aligned_files" folder (This folder is the output from XCA) that is zipped for upload to Fragalysis.

@tdudgeon please see Kalev's comment above (only 63 datasets found in the meta_aligner.yaml), do you have any idea why there is a mismatch with the number of expected datasets?

Waztom commented 8 months ago

@Waztom need to get logs. @tdudgeon logs need to be added to folder for zipping/upload to Fragalysis.

tdudgeon commented 8 months ago

Yes, I'm going to need to see the logs or see this running in action to investigate this. In principle the output YAML should correspond exactly to the files that are present, but this does not seem to be the case here.

tdudgeon commented 8 months ago

@Waztom @ConorFWild I'm trying to process the lb32627-65 dataset to reproduce this error.

But I'm hitting this problem:

ValueError: Chain B not found (only [G H A])

We've seen this before with other sets. The PDB chains have for some reason been renamed and are not what is specified in the crystalforms.yaml.

This is NOT the cause of the missing data, but an earlier error that prevents aligner being run. Possibly the data has changed since it was originally processed?

But whatever, it's going to need to be fixed before the data can be processed.

I also found that the crystalforms are specified in the xtalforms.yaml file. This should be renamed to crystalforms.yaml to avoid complications, and the assemblies.yaml is no longer needed.

tdudgeon commented 8 months ago

I have resolved the chain problem (yes, it was a data problem).

Now when I run I do see a discrepancy between the number of files and the number of entries in the output YAML. And tellingly, in the logs I see things like this:

INFO: looking at XX01ZVNS2B-x1680
WARN: skipping XX01ZVNS2B-x1680 as aligned structures not found

Presumably either

Need to investigate further to establish what is happening.

But also, even accepting this as an edge case that causes random failures, the YAML still needs to be consistent with the files. So when files are missing that entry must not appear in the YAML.

tdudgeon commented 8 months ago

The YAML is in fact correct. Where alignments are not found there is still a section in the output for that crystal as the crystallographic data is still present, but the aligned_files section is (correctly) missing.

So what remains is to work out when the alignments are missing for some crystals. In this lb32627-65 data the following crystals all show the aligned structures not found warning: XX01ZVNS2B-x0182 XX01ZVNS2B-x0846 XX01ZVNS2B-x1597 XX01ZVNS2B-x1680 XX01ZVNS2B-x0075

@ConorFWild I think this is for you to investigate?

mwinokan commented 6 months ago

The XX01ZVNS2B data has been successfully uploaded to v2 staging and ok'd by the data curator as part of #1221. @Waztom please re-open if necessary