Closed Waztom closed 6 months ago
@Waztom 63 site_observation objects (assuming this is what you mean?) for target 1. But investigating this uploads' meta_aligner.yaml
file, there are 63 entries where site_observations are extracted from. Are you sure 77 is the correct number?
@kaliif can confirm there are 78 expected datasets - you can find them in the "upload_1/aligned_files" folder (This folder is the output from XCA) that is zipped for upload to Fragalysis.
@tdudgeon please see Kalev's comment above (only 63 datasets found in the meta_aligner.yaml), do you have any idea why there is a mismatch with the number of expected datasets?
@Waztom need to get logs. @tdudgeon logs need to be added to folder for zipping/upload to Fragalysis.
Yes, I'm going to need to see the logs or see this running in action to investigate this. In principle the output YAML should correspond exactly to the files that are present, but this does not seem to be the case here.
@Waztom @ConorFWild I'm trying to process the lb32627-65 dataset to reproduce this error.
But I'm hitting this problem:
ValueError: Chain B not found (only [G H A])
We've seen this before with other sets. The PDB chains have for some reason been renamed and are not what is specified in the crystalforms.yaml
.
This is NOT the cause of the missing data, but an earlier error that prevents aligner being run. Possibly the data has changed since it was originally processed?
But whatever, it's going to need to be fixed before the data can be processed.
I also found that the crystalforms are specified in the xtalforms.yaml
file. This should be renamed to crystalforms.yaml
to avoid complications, and the assemblies.yaml
is no longer needed.
I have resolved the chain problem (yes, it was a data problem).
Now when I run I do see a discrepancy between the number of files and the number of entries in the output YAML. And tellingly, in the logs I see things like this:
INFO: looking at XX01ZVNS2B-x1680
WARN: skipping XX01ZVNS2B-x1680 as aligned structures not found
Presumably either
Need to investigate further to establish what is happening.
But also, even accepting this as an edge case that causes random failures, the YAML still needs to be consistent with the files. So when files are missing that entry must not appear in the YAML.
The YAML is in fact correct. Where alignments are not found there is still a section in the output for that crystal as the crystallographic data is still present, but the aligned_files
section is (correctly) missing.
So what remains is to work out when the alignments are missing for some crystals. In this lb32627-65
data the following crystals all show the aligned structures not found
warning:
XX01ZVNS2B-x0182
XX01ZVNS2B-x0846
XX01ZVNS2B-x1597
XX01ZVNS2B-x1680
XX01ZVNS2B-x0075
@ConorFWild I think this is for you to investigate?
The XX01ZVNS2B data has been successfully uploaded to v2 staging and ok'd by the data curator as part of #1221. @Waztom please re-open if necessary
77 datasets were uploaded to Fragalysis V2, 63 can be viewed in Fragalysis.
@kaliif, please can you confirm how many datasets were saved to the DB (Target ID =1). Need to see why datasets are missing and if it is backend or frontend related (possibly needs @boriskovar-m2ms).