scipion-em / scipion-pyworkflow

Underlying pyworkflow module for the Scipion framework
GNU General Public License v3.0
6 stars 5 forks source link

CreateOutput failed in test case of Xmipp-AngularGraphConsistency #406

Open jianyingzhu opened 1 year ago

jianyingzhu commented 1 year ago

Hi,

I am trying to run a test case of Xmipp-AngularGraphConsistency by running the following command:

./scipion3 tests xmipp3.tests.test_protocol_angular_graph_consistency.TestAngularGraphConsistency

It reports an error in the final step of create output:

run.stdout:

00410:   correlation with projection in Graph max direction: 0.9489219285714285 
00411:   correlation with assigned projection: 0.9584980000000001 
00412:   angular distance to maxGraph: 63.12582661341961
00413:   to be disabled: 70
00414:   FAILED: createOutput, step 5, time 2023-01-11 19:49:47.676809
00415:   *** Last status is failed 
00416:   ------------------- PROTOCOL FAILED (DONE 5/5)

run.stderr:

Traceback (most recent call last):
00826:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 202, in run
00827:       self._run()
00828:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 253, in _run
00829:       resultFiles = self._runFunc()
00830:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 249, in _runFunc
00831:       return self._func(*self._args)
00832:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/xmipp3/protocols/protocol_angular_graph_consistency.py", line 199, in createOutput
00833:       readSetOfParticles(fnOutParticles, self.subsets[i])
00834:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/xmipp3/convert/convert.py", line 1081, in readSetOfParticles
00835:       readSetOfImages(filename, partSet, rowToParticle, **kwargs)
00836:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/xmipp3/convert/convert.py", line 1014, in readSetOfImages
00837:       imgSet.append(img)
00838:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pwem/objects/data.py", line 1160, in append
00839:       EMSet.append(self, image)
00840:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/object.py", line 1245, in append
00841:       self._insertItem(item)
00842:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/object.py", line 1249, in _insertItem
00843:       self._getMapper().insert(item)
00844:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/mapper/sqlite.py", line 772, in insert
00845:       self.db.insertObject(obj.getObjId(), obj.isEnabled(), obj.getObjLabel(), obj.getObjComment(),
00846:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/mapper/sqlite.py", line 1247, in insertObject
00847:       self.executeCommand(self.INSERT_OBJECT, args)
00848:   sqlite3.IntegrityError: UNIQUE constraint failed: Objects.id
00849:   Protocol failed: UNIQUE constraint failed: Objects.id

It seems that all the calculation is done properly, but the program fails to write the output to a sqlite file.

Our databases stored on an lustre, I wonder if this is similar to the problem of SQLite related I/O error when working on an NFS share

Thank you all for your development efforts and help!

jianyingzhu commented 1 year ago

BTW, the version of scipion-pyworkflow is 3.0.25, the version of scipion-em is 3.0.22, the version of scipion-app is 3.0.11, all of them are not the latest.

The package scipion-pyworkflow is out of date. Your version is 3.0.25, the latest is 3.0.29.
The package scipion-em is out of date. Your version is 3.0.22, the latest is 3.0.24.
The package scipion-app is out of date. Your version is 3.0.11, the latest is 3.0.12
azazellochg commented 1 year ago

Hi @jianyingzhu, thanks for reporting. I'm not familiar with this protocol but I think the error is very simple - duplicated objids. Nothing wrong with your setup.

pconesa commented 1 year ago

I've checked the automatic testing server and this test seems to be passing --> http://scipion-test.cnb.csic.es:9980/#/builders/19/builds/253/steps/104/logs/stdio

I've run it locally and also passed too.

As Grigory said, errors does not seem related to your setup. Maybe the test has some random seed causing it to fail in your case?

Does it fail always?

What happens with other tests? Do they run fine?

jianyingzhu commented 1 year ago

Hi,

I think that maybe the version of sqlite3 package in the pyworkflow of our platform is too low and caused error, so I update scipion-pyworkflow、scipion-em、scipion-app to the latest version by command scipion3 update, and then I passed the test case. I can also run the task successfully on our real data.

I noticed that your test case (http://scipion-test.cnb.csic.es:9980/#/builders/19/builds/253/steps/104/logs/stdio) takes 2236.512 secs (~37min), but my test case taked 2 h 53 min with Threads 1 MPI 4 and another same task by 'copy' taked 2 days 6 h 54 min with Threads 1 MPI 30.

a

Our platform is one GPU nodes with 4 V100 cards, I thought the time consuming is a little bit wired.

pconesa commented 1 year ago

Yes, seems too slow. I'm not sure if there is a way to have a faster sqlite3 for your system?

If so, the sqlite3 is "Inside" scipion3 conda environment. In case there is a way to replace it.

pconesa commented 1 year ago

What is the case for other tests? Is this test something you are interested in?

jianyingzhu commented 1 year ago

Other tests pass successfully.

I am interested in this test because the job failed on my real data and reported the same error. After upgrading, now my job can be run successfully. The time is a little bit long yet within acceptable range.

Thank you very much!

azazellochg commented 1 year ago

I'm running this test now with xmipp 3.22.04. The speed limiting factor is xmipp_mpi_angular_assignment_mag which runs only on CPU and allocates all available CPU cores (divided between 4 MPIs in this test protocol), despite the description saying it only uses 4 threads by default:

00388:   approx. memory to allocate: 2532 MB
00389:   simultaneous MPI processes: 4
00390:   total available system memory: 128668 MB
00391:   4412 reference images of 60 x 60
00392:   105 exp images of 60 x 60 in this group
00393:   Sampling: 7.08
00394:   Angular step: 3
00395:   Maximum shift: 6
00396:   threads: 4
00397:   ref vol size: 60 x 60 x 60

On my machine the test finished successfully in 1214 sec, no sqlite errors.

I think this one is for xmipp team.