google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
222 stars 37 forks source link

issue with tensorflow #76

Closed corkdagga closed 3 months ago

corkdagga commented 5 months ago

Hi,

I installed deepconsensus[cpu]=1.2.0 using pip within a conda environment (I do not have sudo privalages to be able to install from the source).

I installed using: "conda install deepconsensus[cpu]=1.2.0 python==3.9" to get around the installation error described in issue https://github.com/google/deepconsensus/issues/69

The installation worked correctly but when I run deepconsenesus, I get the following issue:

2024-04-16 13:26:55.972174: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-04-16 13:26:56.053962: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-04-16 13:26:56.722497: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-04-16 13:26:56.727045: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-04-16 13:26:59.776981: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning:

TensorFlow Addons (TFA) has ended development and introduction of new features. TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024. Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP).

For more information see: https://github.com/tensorflow/addons/issues/2807

warnings.warn( Traceback (most recent call last): File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/bin/deepconsensus", line 8, in sys.exit(run()) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/cli.py", line 118, in run app.run(main, flags_parser=parse_flags) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/cli.py", line 103, in main app.run(quick_inference.main, argv=passed) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 977, in main outcome_counter = run() File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 803, in run params = model_utils.read_params_from_json(checkpoint_path=FLAGS.checkpoint) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/models/model_utils.py", line 444, in read_params_from_json json.load(tf.io.gfile.GFile(json_path, 'r')) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/json/init.py", line 293, in load return loads(fp.read(), File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 116, in read self._preread_check() File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check self._read_buf = _pywrap_file_io.BufferedInputStream( tensorflow.python.framework.errors_impl.NotFoundError: model/params.json; No such file or directory

I tried to update tensorflow to version 2.13.0 but that didnt fix the problem. I googled a lot and it seems to be a common problem but I could not find any solution so far.

Any help getting this problem solved would be great.

pichuan commented 5 months ago

Hi @corkdagga , from the log above, it seems like you might not have this file:

self._read_buf = _pywrap_file_io.BufferedInputStream( tensorflow.python.framework.errors_impl.NotFoundError: model/params.json; No such file or directory

Can you first check that you have that file or not?

corkdagga commented 4 months ago

Hi Pichuan,

Sorry, I dont have so much bioinformatics experience. Could you please let me know where I might be able to find the file?

pichuan commented 4 months ago

Hi @corkdagga , Have you followed the steps on https://github.com/google/deepconsensus/blob/r1.2/docs/quick_start.md ?

This section has the path of the model, including that file: https://github.com/google/deepconsensus/blob/r1.2/docs/quick_start.md#download-example-data

Let me know if that works!

corkdagga commented 4 months ago

Hi @pichuan,

In the 'Quick Start for DeepConsensus' document I followed the steps provided for how to run the ccs and actc. I am unable to use Docker so I did not follow these steps. I installed DeepConcensus, ccs and actc independently and ran both tools independently using the settings provided in the 'Quick Start for DeepConsensus' to generate the files needed. I am running on a HPC and Docker is not available as a module, so I do not think its possible for me to install.

I am unsure how I can follow the rest of the steps without Docker...

For the model: I was able to follow the steps and I have the model now downloaded, but I am unsure how to get DeepConcensus to find it. Where should I place the following folders:

n1000.subreads.bam model/checkpoint.data-00000-of-00001 model/checkpoint.index model/params.json

pichuan commented 4 months ago

Hi @corkdagga , In this step: https://github.com/google/deepconsensus/blob/r1.2/docs/quick_start.md#run-deepconsensus

It shows that you can use:

  --checkpoint=model/checkpoint \

Let me know if that works for you.

corkdagga commented 4 months ago

Hi again @pichuan

I made some progress but it was still unsusccessful. I used the following command and below is the result. The text was way too large to copy all, so I just copied the last page of text:

srun deepconsensus run --subreads_to_ccs=PD049.CCS.actc.bam --ccs_bam=m54089_200615_125054.CCS.bam --checkpoint=/data/horse/ws/pada358b-genome_assembly/DC_model/model/checkpoint.index --output=PD049pacbio.output_DC.fastq

[ 0.00025856, 0.01601613, -0.03229742, ..., 0.01019282, 0.03764324, -0.02552934]], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_3/ffn/pre_post_processing_wrapper_7/feed_forward_network_3/output_layer/kernel'] <tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_3/ffn/pre_post_processing_wrapper_7/feed_forward_network_3/output_layer/bias:0' shape=(280,) dtype=float32, numpy= array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_3/ffn/pre_post_processing_wrapper_7/feed_forward_network_3/output_layer/bias'] <tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/query/kernel:0' shape=(280, 2, 140) dtype=float32, numpy= array([[[ 0.02355821, 0.09528779, 0.03717279, ..., -0.05718403, -0.01234079, -0.01061931], [-0.0542623 , -0.06560074, 0.00454033, ..., 0.05440269, -0.08055404, -0.05507869]],

   [[ 0.01269599, -0.09101377,  0.00030609, ...,  0.0335236 ,
      0.01732583, -0.02346364],
    [-0.08298358,  0.01272721, -0.05807347, ..., -0.07442309,
     -0.02873039, -0.09584662]],

   [[-0.04943868, -0.05078816,  0.0147185 , ...,  0.02219511,
      0.10019105,  0.03981955],
    [ 0.03070989,  0.03485336, -0.00275103, ..., -0.01758868,
     -0.06381247, -0.02471267]],

   ...,

   [[-0.06253722,  0.02299368, -0.04022574, ..., -0.01903863,
      0.08243453, -0.02809814],
    [-0.09463223,  0.04108616, -0.05539139, ..., -0.03844319,
      0.07349332, -0.0749604 ]],

   [[-0.05008304, -0.02539364,  0.05218483, ..., -0.00389064,
     -0.01195891, -0.00670209],
    [-0.09415166, -0.05202752,  0.04484346, ..., -0.09822648,
      0.0377209 ,  0.03204083]],

   [[-0.05568835,  0.05245037,  0.00362601, ...,  0.04655186,
      0.08222169,  0.0323906 ],
    [ 0.01768564, -0.00624343, -0.00904115, ...,  0.10269467,
      0.05640509,  0.09641363]]], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/query/kernel']
<tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/key/kernel:0' shape=(280, 2, 140) dtype=float32, numpy=

array([[[ 0.02819713, 0.02876288, 0.02274299, ..., 0.08866426, -0.05392299, 0.05793411], [ 0.06952351, -0.04070581, -0.10173748, ..., 0.02893355, -0.00330812, -0.07877984]],

   [[ 0.05495454, -0.01093845, -0.09050546, ...,  0.0598898 ,
     -0.06155247,  0.02567532],
    [-0.08424073,  0.04196412,  0.09877456, ..., -0.09007102,
      0.05712973, -0.08940084]],

   [[-0.00063784, -0.05345098,  0.09418813, ..., -0.02460335,
     -0.05220254,  0.06893248],
    [ 0.03363927,  0.01110125, -0.01965451, ..., -0.07790814,
      0.05504029,  0.03821679]],

   ...,

   [[ 0.02271084, -0.03269391, -0.06044077, ...,  0.08869565,
     -0.01723188, -0.05409934],
    [ 0.10146245,  0.07813064,  0.08319003, ...,  0.09265465,
      0.06040571,  0.02639508]],

   [[-0.08208962,  0.04218147,  0.09731463, ..., -0.05135154,
      0.09131274, -0.03821055],
    [ 0.05291563, -0.00134318, -0.06305656, ..., -0.03713792,
     -0.06834337,  0.05610258]],

   [[ 0.03926406,  0.01563267, -0.06823291, ..., -0.03778756,
      0.03835034, -0.06163606],
    [-0.01661775,  0.05798808, -0.09441767, ...,  0.02236413,
      0.00829161, -0.05996104]]], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/key/kernel']
<tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/value/kernel:0' shape=(280, 2, 140) dtype=float32, numpy=

array([[[ 0.02406118, 0.0343625 , -0.07950142, ..., 0.04772563, 0.00969613, 0.07611965], [ 0.03666263, -0.08229625, -0.00219458, ..., -0.08711676, 0.04733781, -0.09672994]],

   [[ 0.08699372,  0.04701721,  0.07498317, ...,  0.06015236,
      0.09311568, -0.02617282],
    [-0.05987625, -0.0290884 , -0.08333313, ..., -0.05092367,
      0.00697725, -0.07460243]],

   [[ 0.07449644, -0.10163396, -0.02565712, ...,  0.02633942,
     -0.01938078,  0.04304118],
    [-0.06267202,  0.02551261,  0.0715261 , ..., -0.02070328,
     -0.00463021, -0.01613385]],

   ...,

   [[-0.05981055,  0.08555805, -0.01196311, ..., -0.00164606,
      0.09820005,  0.02571293],
    [-0.07315273, -0.00021137, -0.01340375, ..., -0.09697875,
     -0.03929126, -0.04775184]],

   [[ 0.01185719,  0.0946143 ,  0.06456488, ..., -0.05574758,
     -0.00115824,  0.10159127],
    [-0.02799977, -0.05600816,  0.06503621, ...,  0.07093834,
      0.06910012,  0.05950873]],

   [[-0.10163818,  0.06163784,  0.05308998, ..., -0.01529515,
     -0.03851745,  0.01416492],
    [-0.05432293,  0.0251294 ,  0.03094485, ..., -0.04001784,
     -0.01994764, -0.08600642]]], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/value/kernel']
<tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/output_transform/kernel:0' shape=(2, 140, 280) dtype=float32, numpy=

array([[[ 0.01040795, -0.07024652, 0.00894686, ..., -0.03485329, 0.00861081, 0.07646223], [-0.06738147, -0.07616153, 0.03231331, ..., 0.0511472 , 0.00560744, -0.08211486], [-0.02944059, -0.03470305, 0.00333571, ..., -0.04788389, -0.01682236, -0.06459328], ..., [-0.00735737, -0.09849185, -0.07305449, ..., -0.05286082, -0.08518486, 0.03404469], [ 0.00874639, 0.00044738, -0.05671701, ..., 0.02448469, 0.04249338, -0.05758846], [-0.02678297, -0.05845858, -0.00940267, ..., 0.05154238, 0.0932558 , -0.04002283]],

   [[-0.07215314,  0.05229052, -0.00614032, ...,  0.00989876,
     -0.08904382, -0.03186455],
    [ 0.01517111,  0.09886045, -0.02317875, ...,  0.02663433,
      0.03900095, -0.04038326],
    [ 0.07352556, -0.08514468, -0.10342583, ..., -0.08604371,
     -0.01308498, -0.02873522],
    ...,
    [-0.07181344,  0.08932251, -0.07083805, ..., -0.02752328,
      0.09097875,  0.09865584],
    [-0.04752675,  0.02988801,  0.08618397, ..., -0.0484809 ,
      0.06751857,  0.00548177],
    [ 0.09906223,  0.04683784, -0.05216091, ...,  0.02303959,
      0.00025039, -0.06622276]]], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/self_attention/pre_post_processing_wrapper_8/self_attention_4/output_transform/kernel']
<tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/filter_layer/kernel:0' shape=(280, 2048) dtype=float32, numpy=

array([[ 0.01415678, -0.03605887, 0.03162613, ..., -0.03945763, -0.0073288 , 0.03972591], [ 0.02551332, 0.00799932, 0.05006762, ..., 0.02487199, 0.04509846, 0.02466176], [ 0.00373553, -0.04407531, 0.01345216, ..., -0.01362591, 0.02301843, -0.01835387], ..., [ 0.0322591 , 0.00598079, 0.03762009, ..., -0.03224697, 0.04059494, 0.00033377], [-0.00664014, 0.00708507, -0.04877863, ..., 0.0017563 , 0.01244135, 0.02614061], [-0.03656027, 0.04186665, -0.03868963, ..., 0.03929602, -0.04559483, -0.04399271]], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/filter_layer/kernel'] <tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/filter_layer/bias:0' shape=(2048,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/filter_layer/bias'] <tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/output_layer/kernel:0' shape=(2048, 280) dtype=float32, numpy= array([[ 2.3738377e-02, 1.6624801e-02, -6.6365153e-03, ..., -3.4411326e-02, -3.7041619e-02, -5.0726425e-02], [ 9.1396198e-03, -8.4491409e-03, -3.7688024e-02, ..., 2.4418816e-02, -2.3622457e-02, -2.4965273e-02], [-3.1535979e-02, 2.9537365e-02, 3.0608639e-02, ..., 1.1284884e-02, 3.5906903e-02, -8.8443719e-03], ..., [ 4.3789275e-02, 1.9220598e-03, 2.7374551e-02, ..., 2.6777297e-02, 1.8576272e-02, 9.0695880e-03], [-3.4050934e-02, -1.3096701e-02, 1.8710926e-02, ..., -3.8043298e-03, 1.2257956e-03, -4.1745387e-02], [-1.6838312e-06, 5.3003207e-03, -2.6208920e-02, ..., 3.8072430e-02, -1.5364058e-02, 2.3037903e-03]], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/output_layer/kernel'] <tf.Variable 'encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/output_layer/bias:0' shape=(280,) dtype=float32, numpy= array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>: ['encoder_only_learned_values_transformer/Transformer/encode/encoder_stack/layer_4/ffn/pre_post_processing_wrapper_9/feed_forward_network_4/output_layer/bias'] srun: error: n1491: task 0: Exited with exit code 1

pichuan commented 4 months ago

Hi @corkdagga ,

In your update, you said you used --checkpoint=/data/horse/ws/pada358b-genome_assembly/DC_model/model/checkpoint.index

Can you try only use the prefix like the https://github.com/google/deepconsensus/blob/r1.2/docs/quick_start.md#run-deepconsensus suggested. So: --checkpoint=/data/horse/ws/pada358b-genome_assembly/DC_model/model/checkpoint

I don't think it would work if you pass in the index.

Thanks!

corkdagga commented 4 months ago

Hi @pichuan

Deepconsensus was running for a while successfully but unfortunately the following error appeared:

I0503 10:20:37.394586 140737354053440 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.321 seconds I0503 10:20:37.467836 140737354053440 quick_inference.py:931] Processed 47000 ZMWs in 2796.068 seconds I0503 10:20:42.877036 140737354053440 quick_inference.py:693] Example summary: ran model=30 (4.73%; 0.168s) skip=604 (95.27%; 0.033s) total=634. I0503 10:20:42.886625 140737354053440 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.193 seconds I0503 10:20:42.926704 140737354053440 quick_inference.py:931] Processed 47100 ZMWs in 2801.527 seconds I0503 10:20:48.733838 140737354053440 quick_inference.py:693] Example summary: ran model=117 (18.34%; 0.406s) skip=521 (81.66%; 0.030s) total=638. I0503 10:20:48.742551 140737354053440 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.441 seconds I0503 10:20:48.783353 140737354053440 quick_inference.py:931] Processed 47200 ZMWs in 2807.384 seconds Traceback (most recent call last): File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/bin/deepconsensus", line 8, in sys.exit(run()) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/cli.py", line 118, in run app.run(main, flags_parser=parse_flags) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/cli.py", line 103, in main app.run(quick_inference.main, argv=passed) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 977, in main outcome_counter = run() File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 912, in run for zmw, subreads, dc_config, window_widths in input_file_generator: File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 480, in stream_bam for input_data in proc_feeder(): File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/preprocess/pre_lib.py", line 1309, in proc_feeder for read_set in subread_grouper: File "/data/horse/ws/pada358b-genome_assembly/conda/envs/DCpy39/lib/python3.9/site-packages/deepconsensus/preprocess/pre_lib.py", line 73, in next read = next(self.bam_reader) File "pysam/libcalignmentfile.pyx", line 1876, in pysam.libcalignmentfile.AlignmentFile.next OSError: error -3 while reading file srun: error: n1555: task 0: Exited with exit code 1

corkdagga commented 4 months ago

Hi again @pichuan

A small update regarding the message above.

I reinstalled deepconensus locally on a new computer with GPU. I installed it using pip [gpu=1.2.0] and everything seemed to go ok (I did however have difficulty installing with docker).

Anyway, it installed correctly with pip and I again ran Deepconcensus on the same data as previously. However, I again received an error at exactly the same position as last time - after processing 47200 ZMWs. Therefore I have a feeling it is most likely an error with my input files? Do you agree and do you know how I could check, then fix that? Below is a portion of the output for the new run.

Thanks!

I0507 08:57:24.521475 128752640931648 quick_inference.py:693] Example summary: ran model=53 (8.48%; 0.150s) skip=572 (91.52%; 0.047s) total=625. I0507 08:57:24.533131 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.262 seconds I0507 08:57:24.570044 128752640931648 quick_inference.py:931] Processed 46100 ZMWs in 2692.503 seconds I0507 08:57:30.603274 128752640931648 quick_inference.py:693] Example summary: ran model=80 (12.58%; 0.224s) skip=556 (87.42%; 0.091s) total=636. I0507 08:57:30.621727 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.502 seconds I0507 08:57:30.665697 128752640931648 quick_inference.py:931] Processed 46200 ZMWs in 2698.599 seconds I0507 08:57:36.891413 128752640931648 quick_inference.py:693] Example summary: ran model=78 (11.42%; 0.224s) skip=605 (88.58%; 0.054s) total=683. I0507 08:57:36.908673 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.544 seconds I0507 08:57:36.957104 128752640931648 quick_inference.py:931] Processed 46300 ZMWs in 2704.890 seconds I0507 08:57:43.048226 128752640931648 quick_inference.py:693] Example summary: ran model=144 (21.69%; 0.258s) skip=520 (78.31%; 0.059s) total=664. I0507 08:57:43.059175 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.392 seconds I0507 08:57:43.107503 128752640931648 quick_inference.py:931] Processed 46400 ZMWs in 2711.041 seconds I0507 08:57:48.900568 128752640931648 quick_inference.py:693] Example summary: ran model=75 (11.65%; 0.237s) skip=569 (88.35%; 0.048s) total=644. I0507 08:57:48.911521 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.338 seconds I0507 08:57:48.953398 128752640931648 quick_inference.py:931] Processed 46500 ZMWs in 2716.886 seconds I0507 08:57:55.150791 128752640931648 quick_inference.py:693] Example summary: ran model=63 (10.77%; 0.163s) skip=522 (89.23%; 0.042s) total=585. I0507 08:57:55.160457 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.324 seconds I0507 08:57:55.208146 128752640931648 quick_inference.py:931] Processed 46600 ZMWs in 2723.141 seconds I0507 08:58:02.075521 128752640931648 quick_inference.py:693] Example summary: ran model=80 (12.12%; 0.264s) skip=580 (87.88%; 0.076s) total=660. I0507 08:58:02.086779 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.637 seconds I0507 08:58:02.146010 128752640931648 quick_inference.py:931] Processed 46700 ZMWs in 2730.079 seconds I0507 08:58:07.874739 128752640931648 quick_inference.py:693] Example summary: ran model=70 (11.25%; 0.184s) skip=552 (88.75%; 0.053s) total=622. I0507 08:58:07.886029 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.266 seconds I0507 08:58:07.928063 128752640931648 quick_inference.py:931] Processed 46800 ZMWs in 2735.861 seconds I0507 08:58:14.496442 128752640931648 quick_inference.py:693] Example summary: ran model=120 (17.14%; 0.284s) skip=580 (82.86%; 0.049s) total=700. I0507 08:58:14.508665 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.474 seconds I0507 08:58:14.554958 128752640931648 quick_inference.py:931] Processed 46900 ZMWs in 2742.488 seconds I0507 08:58:20.421319 128752640931648 quick_inference.py:693] Example summary: ran model=48 (7.78%; 0.175s) skip=569 (92.22%; 0.060s) total=617. I0507 08:58:20.439472 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.281 seconds I0507 08:58:20.486044 128752640931648 quick_inference.py:931] Processed 47000 ZMWs in 2748.419 seconds I0507 08:58:25.846841 128752640931648 quick_inference.py:693] Example summary: ran model=30 (4.73%; 0.131s) skip=604 (95.27%; 0.049s) total=634. I0507 08:58:25.857717 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.219 seconds I0507 08:58:25.896061 128752640931648 quick_inference.py:931] Processed 47100 ZMWs in 2753.829 seconds I0507 08:58:31.649722 128752640931648 quick_inference.py:693] Example summary: ran model=117 (18.34%; 0.252s) skip=521 (81.66%; 0.043s) total=638. I0507 08:58:31.660274 128752640931648 quick_inference.py:770] Processed a batch of 100 ZMWs in 1.349 seconds I0507 08:58:31.704504 128752640931648 quick_inference.py:931] Processed 47200 ZMWs in 2759.638 seconds Traceback (most recent call last): File "/home/gulderlab/miniconda3/envs/DC/bin/deepconsensus", line 8, in sys.exit(run()) File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/deepconsensus/cli.py", line 118, in run app.run(main, flags_parser=parse_flags) File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/deepconsensus/cli.py", line 103, in main app.run(quick_inference.main, argv=passed) File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 977, in main outcome_counter = run() File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 912, in run for zmw, subreads, dc_config, window_widths in input_file_generator: File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 480, in stream_bam for input_data in proc_feeder(): File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/deepconsensus/preprocess/pre_lib.py", line 1309, in proc_feeder for read_set in subread_grouper: File "/home/gulderlab/miniconda3/envs/DC/lib/python3.9/site-packages/deepconsensus/preprocess/pre_lib.py", line 73, in next read = next(self.bam_reader) File "pysam/libcalignmentfile.pyx", line 1876, in pysam.libcalignmentfile.AlignmentFile.next OSError: error -3 while reading file

corkdagga commented 4 months ago

Hi @pichuan,

Just wanted to check in and see if you had any potential ideas for the errors described above?

pichuan commented 4 months ago

Hi @corkdagga , I agree that this looks more like some issue with your input file. Can you check your input file?

By the way, I can't remember - did you try going through Quick Start (using the inputs provided there) and confirm that your current setup works?

corkdagga commented 4 months ago

Hi @pichuan

I will have the check the file itself on the weekend (sorry) but I will get back to you about that.

To generate the files, I used the following commands based on the quick start:

srun ccs -j 12 --min-rq=0.88 m54089_200615_125054.subreads.bam PD049.CCS.bam

srun actc -j 12 m54089_200615_125054.subreads.bam PD049.CCS.bam PD049.CCS.actc.bam

and then run DeepConcensensus using the recommended input:

deepconsensus run \ --subreads_to_ccs=${shard_id}.subreads_to_ccs.bam \ --ccs_bam=${shard_id}.ccs.bam \ --checkpoint=model/checkpoint \

I am not performing any sharding, so for the --subreads_to_ccs= and --ccs_bam=${shard_id}.ccs.bam arguments I am using the ccs and actc files generated earlier.

Not sure if this information is helpful to see some kind of error I am making. Otherwise I will try check the file on the weekend and I will try and generate the ccs and actc files again, perhaps that helps.

Thanks!

corkdagga commented 3 months ago

Hi @pichuan

I have done some more work on the problem. I think my issues are with the starting files and not DeepConcensus. You can close the issue it you like. Thanks for all the help.