xchem / fragalysis-loader

Django backend loader for fragalysis app
Apache License 2.0
1 stars 4 forks source link

Load against an empty stack results in "NEW_DATA not found" - shutil.move() or check error? #45

Closed alanbchristie closed 4 years ago

alanbchristie commented 4 years ago

Using stack and loader 1.0.7 (but error is probably resident in the latest loader)

Starting a stack from scratch (i.e. new stack, new database) I attempt to run the loader to load data from 2020-09-15T16. The load fails with the error NEW_DATA not found for /code/media/targets/mArh/aligned.

The log of the loader is: -

2020-09-22T09:33:30.804273018Z +> Removing /code/media/NEW_DATA
2020-09-22T09:33:30.829663804Z +> Copying /mounted-media/2020-09-15T16 to /code/media/NEW_DATA...
2020-09-22T09:34:28.427573604Z +> Running loader...
2020-09-22T09:34:28.430515993Z Running loader...
2020-09-22T09:34:29.558073951Z /usr/local/lib/python3.7/dist-packages/frag/utils/network_utils.py:6: DeprecationWarning: The rdkit.Chem.MCS module is deprecated; please use rdkit.Chem.rdFMCS instead.
2020-09-22T09:34:29.558107932Z   from rdkit.Chem import MCS
2020-09-22T09:34:29.565789696Z /usr/lib/python3/dist-packages/rdkit/Chem/fmcs/fmcs.py:2453: DeprecationWarning: invalid escape sequence \d
2020-09-22T09:34:29.565841763Z   value_pat = re.compile("(\d+)")
2020-09-22T09:34:29.730060924Z Saving uploaded data to /code/media/targets
2020-09-22T09:34:29.730485312Z TARGET_PATH: /code/media/targets/mArh/aligned
2020-09-22T09:34:29.732620481Z No data to add: mArh
2020-09-22T09:34:29.732646031Z NEW_DATA not found for /code/media/targets/mArh/aligned
2020-09-22T09:34:30.830325979Z Generated private data
2020-09-22T09:34:30.925403931Z +> Done.

When I build a custom loader with additional log (based on the same backend) I see this: -

2020-09-22T10:47:09.417478927Z +> Removing /code/media/NEW_DATA
2020-09-22T10:47:09.432860772Z +> Copying /mounted-media/2020-09-15T16 to /code/media/NEW_DATA...
2020-09-22T10:48:11.585351393Z +> Running loader...
2020-09-22T10:48:11.589589303Z Running loader...
2020-09-22T10:48:12.625184855Z /usr/local/lib/python2.7/dist-packages/frag/utils/network_utils.py:6: DeprecationWarning: The rdkit.Chem.MCS module is deprecated; please use rdkit.Chem.rdFMCS instead.
2020-09-22T10:48:12.625226266Z   from rdkit.Chem import MCS
2020-09-22T10:48:12.834029593Z prefix="/code/media/NEW_DATA/"
2020-09-22T10:48:12.834089609Z target_name="mArh"
2020-09-22T10:48:12.834106204Z target_path="/code/media/NEW_DATA/mArh"
2020-09-22T10:48:12.8341205Z target_upload_path="/code/media/targets/mArh"
2020-09-22T10:48:12.834134306Z upload_path="/code/media/targets" 
2020-09-22T10:48:12.835157185Z target_path="/code/media/targets/mArh/aligned"
2020-09-22T10:48:12.837261421Z No data to add: mArh
2020-09-22T10:48:12.837309642Z new_data_file="/code/media/targets/mArh/aligned/NEW_DATA"
2020-09-22T10:48:12.837326107Z NEW_DATA not found for /code/media/targets/mArh/aligned
2020-09-22T10:48:13.722567462Z Generated private data
2020-09-22T10:48:13.787687687Z +> Done.

Notice that there are two upload paths: - upload_path and target_upload_path.

When the resultant container image is inspected the error is found to be correct ... there is no /code/media/targets/mArh/aligned/NEW_DATA file ... that's because the file is moved to /code/media/targets, not /code/media/targets/mArh.

... beacuse ... the loader runs this command: -

shutil.move(target_path, upload_path)

And upload_path is /code/media/targets. Consequently, when the data is moved the NEW_DATA file exists but it's new path is /code/media/targets/aligned/NEW_DATA not /code/media/targets/mArh/aligned/NEW_DATA.

Is this a fault with the shutil.move(target_path, upload_path) command or the subsequent check that results in the NEW_DATA not found error?

Either way - the loader doesn't load the data.

alanbchristie commented 4 years ago

Interestingly...

If you just run the loader again, i.e. same data, same stack the loader does not issue the NEW_DATA not found for /code/media/targets/mArh/aligned. Instead it continues (still unsuccessfully) to load (process) the data.

The loader logs (abbreviated) from the second run are: -

2020-09-22T11:00:14.898492239Z +> Removing /code/media/NEW_DATA
2020-09-22T11:00:14.919340095Z +> Copying /mounted-media/2020-09-15T16 to /code/media/NEW_DATA...
2020-09-22T11:01:16.268438666Z +> Running loader...
2020-09-22T11:01:16.272354911Z Running loader...
2020-09-22T11:01:17.421531108Z /usr/local/lib/python2.7/dist-packages/frag/utils/network_utils.py:6: DeprecationWarning: The rdkit.Chem.MCS module is deprecated; please use rdkit.Chem.rdFMCS instead.
2020-09-22T11:01:17.421561519Z   from rdkit.Chem import MCS
2020-09-22T11:01:17.673530573Z prefix="/code/media/NEW_DATA/"
2020-09-22T11:01:17.673571021Z target_name="mArh"
2020-09-22T11:01:17.673578984Z target_path="/code/media/NEW_DATA/mArh"
2020-09-22T11:01:17.67358581Z target_upload_path="/code/media/targets/mArh"
2020-09-22T11:01:17.673592826Z upload_path="/code/media/targets" 
2020-09-22T11:01:17.676483584Z target_path="/code/media/targets/mArh/aligned"
2020-09-22T11:01:17.913643201Z /code/media/targets/mArh/aligned/*_bound.pdb
2020-09-22T11:01:17.914568097Z []
2020-09-22T11:01:17.915339431Z /code/media/targets/mArh/aligned/mArh-x0469_0_B/*_bound.pdb
2020-09-22T11:01:17.916382249Z ['/code/media/targets/mArh/aligned/mArh-x0469_0_B/mArh-x0469_0_B_bound.pdb']
2020-09-22T11:01:17.921002165Z /code/media/targets/mArh/aligned/mArh-x0711_0_A/*_bound.pdb
2020-09-22T11:01:17.921884898Z ['/code/media/targets/mArh/aligned/mArh-x0711_0_A/mArh-x0711_0_A_bound.pdb']
2020-09-22T11:01:17.92590135Z /code/media/targets/mArh/aligned/5CMS_0_L/*_bound.pdb
2020-09-22T11:01:17.926498678Z ['/code/media/targets/mArh/aligned/5CMS_0_L/5CMS_0_L_bound.pdb']
[...]
2020-09-22T11:01:59.007091744Z Skipping contacts - mArh-x1018_0_A
2020-09-22T11:01:59.007125591Z Skipping confidence - mArh-x1018_0_A
2020-09-22T11:01:59.007132882Z mArh-x1092_0_B
2020-09-22T11:01:59.008917828Z Path - /code/media/targets/mArh/aligned/mArh-x1092_0_B/mArh-x1092_0_B_pandda.map not found.
2020-09-22T11:01:59.009343221Z Path - /code/media/targets/mArh/aligned/mArh-x1092_0_B/mArh-x1092_0_B.mtz not found.
2020-09-22T11:01:59.009852866Z Path - /code/media/targets/mArh/aligned/mArh-x1092_0_B/mArh-x1092_0_B_contacts.json not found.
2020-09-22T11:01:59.010307145Z Path - /code/media/targets/mArh/aligned/mArh-x1092_0_B/mArh-x1092_0_B_lig_conf.json not found.
2020-09-22T11:01:59.010756777Z Path - /code/media/targets/mArh/aligned/mArh-x1092_0_B/mArh-x1092_0_B_acceptor.ccp4 not found.
2020-09-22T11:01:59.011214112Z Path - /code/media/targets/mArh/aligned/mArh-x1092_0_B/mArh-x1092_0_B_donor.ccp4 not found.
2020-09-22T11:01:59.011649961Z Path - /code/media/targets/mArh/aligned/mArh-x1092_0_B/mArh-x1092_0_B_apolar.ccp4 not found.
2020-09-22T11:01:59.023250236Z Protein created = True
2020-09-22T11:01:59.133402542Z OLD MOLS = 0
2020-09-22T11:01:59.197720738Z Skipping contacts - mArh-x1092_0_B
2020-09-22T11:01:59.197758991Z Skipping confidence - mArh-x1092_0_B
2020-09-22T11:01:59.223326061Z new_data_file="/code/media/targets/mArh/aligned/NEW_DATA"
2020-09-22T11:01:59.223773982Z Analysing target: mArh
2020-09-22T11:01:59.247915138Z Analysing 215 molecules for mArh
2020-09-22T11:01:59.261348785Z Traceback (most recent call last):
2020-09-22T11:01:59.261394642Z   File "loader.py", line 43, in <module>
2020-09-22T11:01:59.26141104Z     process_target(prefix, target_name, app=app)
2020-09-22T11:01:59.261421738Z   File "/code/loader/loaders.py", line 1047, in process_target
2020-09-22T11:01:59.26173952Z     analyse_target(target_name, target_path)
2020-09-22T11:01:59.26178382Z   File "/code/loader/loaders.py", line 748, in analyse_target
2020-09-22T11:01:59.261820059Z     target.metadata.save(
2020-09-22T11:01:59.261830382Z AttributeError: 'Target' object has no attribute 'metadata'
2020-09-22T11:01:59.363735646Z Sentry is attempting to send 1 pending error messages
2020-09-22T11:01:59.36378126Z Waiting up to 2.0 seconds
2020-09-22T11:01:59.363795796Z Press Ctrl-C to quit
2020-09-22T11:02:00.807315064Z Generated private data
2020-09-22T11:02:00.872591105Z +> Done.

The AttributeError: 'Target' object has no attribute 'metadata' is not of interest here, the observation is that simply running the loader again does not result in the error.

So that's something to think about - why does the 2nd run avoid the error?

alanbchristie commented 4 years ago

Just for information the problem continues to exist with stack and loader 2.0.0 but with a 2nd execution of the loader I do not get the AttributeError: 'Target' object has no attribute 'metadata'.

In summary it's broken in 1.0.7 but appears to work (if you run the loader twice) using 2.0.0.

alanbchristie commented 4 years ago

Looks like the latets commit fixes it.

Executed the fix (xchem/fragalysis-mount-loader:2.0.3) against xchem/fragalysis-stack:2.0.3 and the loader progresses to completion without an error.