google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
222 stars 37 forks source link

Error: ModuleNotFoundError: No module named 'pandas._libs.interval' #47

Closed gevro closed 1 year ago

gevro commented 1 year ago

Hi, Getting this error with new deepconsensus 1.0.0. I'm running the exact same command that worked for the previous deepconsensus version. I'm running from the docker.

2022-10-11 16:17:35.711032: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/pandas/__init__.py", line 30, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/pandas/_libs/__init__.py", line 13, in <module>
    from pandas._libs.interval import Interval
ModuleNotFoundError: No module named 'pandas._libs.interval'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/envs/bio/bin/deepconsensus", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 111, in run
    app.run(main, flags_parser=parse_flags)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 99, in main
    from deepconsensus.inference import quick_inference
  File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 53, in <module>
    import pandas as pd
  File "/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/pandas/__init__.py", line 34, in <module>
    raise ImportError(
ImportError: C extension: No module named 'pandas._libs.interval' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
danielecook commented 1 year ago

@gevro thank you for reporting. Can you share the command you are running?

gevro commented 1 year ago
singularity run -W /data -B /scratch/projects/bin/deepconsensus/model:/model -B `pwd` /scratch/projects/bin/deepconsensus/deepconsensus_1.0.0.sif deepconsensus run --batch_size=1024 --batch_zmws=100 --cpus 4 --max_passes 20 --subreads_to_ccs=subreads_to_ccs.bam --ccs_bam=ccs.bam --checkpoint=/model/checkpoint --output=output.deepconsensus.fastq
danielecook commented 1 year ago

@gevro please try running singularity with a clean environment. I think passing --cleanenv should work.

singularity run \
  -W /data \
  -B /scratch/projects/bin/deepconsensus/model:/model \
  -B `pwd` \
  --cleanenv \
  /scratch/projects/bin/deepconsensus/deepconsensus_1.0.0.sif \
  deepconsensus run \
    --batch_size=1024 \
    --batch_zmws=100 \
    --cpus 4 \
    --max_passes 20 \
    --subreads_to_ccs=subreads_to_ccs.bam \
    --ccs_bam=ccs.bam \
    --checkpoint=/model/checkpoint \
    --output=output.deepconsensus.fastq

For whatever reason, it appears you are using a shared version of the python library from your machine/HPC:

/share/apps/python/3.8.6/intel/lib/python3.8/site-packages/pandas/__init__.py

And this appears to be breaking things.

gevro commented 1 year ago

Now I'm getting this error:

2022-10-11 17:08:33.096036: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
FATAL Flags parsing error: Unknown command line flag 'max_passes'
Pass --helpshort or --helpfull to see help on flags.
danielecook commented 1 year ago

We removed the --max_passes flag in this release because we have not provided any models with alternative numbers of passes. Please try removing this flag:

singularity run \
  -W /data \
  -B /scratch/projects/bin/deepconsensus/model:/model \
  -B `pwd` \
  --cleanenv \
  /scratch/projects/bin/deepconsensus/deepconsensus_1.0.0.sif \
  deepconsensus run \
    --batch_size=1024 \
    --batch_zmws=100 \
    --cpus 4 \
    --subreads_to_ccs=subreads_to_ccs.bam \
    --ccs_bam=ccs.bam \
    --checkpoint=/model/checkpoint \
    --output=output.deepconsensus.fastq

Also - the --max_passes, and --example_width flags are now dictated by the model params.json - so there is no need to set these flags.

gevro commented 1 year ago

Ok working now. Thanks.