Closed townsk closed 3 years ago
Okay pyarrow issue resolved - now having issues with pandas. Specifically loading the module pandas._libs.interval. See error code below:
Hey. The pyarrow issue is a bit strange, because we are using nextflow with conda!
Here we define the conda environment for the filter_barcode
step: https://github.com/shendurelab/MPRAflow/blob/fd5ff26b04686196ac37ed849afbcfc01b303b3f/association.nf#L494
And here you can find pyarrow
in the in the environment: https://github.com/shendurelab/MPRAflow/blob/fd5ff26b04686196ac37ed849afbcfc01b303b3f/conf/mpraflow_py36.yml#L108
Maybe you nextflow run does not use conda, just your base eenvironment? This becomes very important for the count step because here we have script with python 2 and python 3.
When you run the scipt by yourselv you have to use also the environment:
conda env create -n mpraflow_p36 - f conf/mpraflow_py36.yml
conda activate mpraflow_p36
python src/nf_filter_barcodes.py
@townsk you addes some comments yesterday ( saw them in my mails) but they are not listed anymore. Just let me know if you need further help
@visze Thanks for checking in. I worked through the issues I posted so I removed them, but I am still having trouble. When I enter the script myself I don't get any errors but it runs for hours without completion -- do you have an estimated runtime for the filter barcode process?
The runtime of nf_filter_barcodes.py
script depends on the size of the input. But in theory it should be one of the quicker scripts. Definetifely under 1 hour.
But maybe you have some issues with plotting the violin plots. Can you comment out these lines: https://github.com/shendurelab/MPRAflow/blob/fd5ff26b04686196ac37ed849afbcfc01b303b3f/src/nf_filter_barcodes.py#L133 and https://github.com/shendurelab/MPRAflow/blob/fd5ff26b04686196ac37ed849afbcfc01b303b3f/src/nf_filter_barcodes.py#L142
Plotting the violin plots does seem to be the issue - once line 133 and 142 are commented out it runs and creates the filtered barcode pickle file.
thanks. good to know that it worked now. But strange that it fails creating the plots. I can imagine 3 possible issues:
But I can only debug this wehn you give me your input data.
I will close this because I cannot debug it without the data.
please reopen if necessary
Any suggestions for "ModuleNotFoundError: No module named 'pyarrow'?"
Prior to running the association script I installed pyarrow: conda install pyarrow -c conda-forge
Thanks!