google / deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
BSD 3-Clause "New" or "Revised" License
222 stars 37 forks source link

Use of samtools and awk to filter incorrect alignments. #53

Closed ap1438 closed 1 year ago

ap1438 commented 1 year ago

I was trying to utilize the pipeline for one of my samples. But I am not able to understand the filtering process.

Its not present in the quick start guide.

But its there in the paper. Do i need to use this filtering process with the latest version of the Deep Consensus or its been added in the new release.

Command from Paper samtools view -h "aligned.subreads.bam" | \ awk '{ if($1 ~ /^@/) { print; } else { split($1,A,"/"); \ split($3,B,"/"); if(A[2]==B[2]) { split(A[3],C,"_"); \ print $0 "\tqs:i:" C[1]; } } }'

Also In this command the input was aligned.subreads.bam. But there is no output file mentioned for this step.

-Will it produce an output (I guess it should produce output as awk is piped with the samtools view command). -if output will be produced in this step .It will be in which format (Because currently it prints in the terminal, And i see that in the next step i.e DeepConsensus run .bam file is used for subreads.aligned.bam ). And awk will produce a file that will not be .bam (If i am not wrong)

danielecook commented 1 year ago

@ap1438 if you are using actc then there should be no need to filter alignments using samtools and awk. actc only aligns subreads to their corresponding CCS sequence.

The commands featured in the supplement were used early on when we did not yet have actc. We would align subreads to CCS sequences using pbmm2. In general, subreads align correctly to their corresponding CCS sequence. However, there were some cases where subreads aligned to other CCS sequences and so we had to filter these out. Fortunately, we no longer have to do this.

I will close this now - but if you need any further clarification feel free to reopen. Thanks.

ap1438 commented 1 year ago

docker run google/deepconsensus:1.1.0 deepconsensus run --subreads_to_ccs=m54274Ue_220814_163631.aligned.subreads.bam --ccs_bam=m54274Ue_220814_163631.hifi_S3_reads.bam --checkpoint=model/checkpoint --output=m54274Ue_220814_163631_deepcon.output.fastq Traceback (most recent call last): File "/opt/conda/envs/bio/bin/deepconsensus", line 8, in sys.exit(run()) File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 111, in run app.run(main, flags_parser=parse_flags) File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/cli.py", line 102, in main app.run(quick_inference.main, argv=passed) File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/opt/conda/envs/bio/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 842, in main outcome_counter = run() File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/inference/quick_inference.py", line 703, in run params = model_utils.read_params_from_json(checkpoint_path=FLAGS.checkpoint) File "/opt/conda/envs/bio/lib/python3.9/site-packages/deepconsensus/models/model_utils.py", line 405, in read_params_from_json json.load(tf.io.gfile.GFile(json_path, 'r'))) File "/opt/conda/envs/bio/lib/python3.9/json/init.py", line 293, in load return loads(fp.read(), File "/opt/conda/envs/bio/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 116, in read self._preread_check() File "/opt/conda/envs/bio/lib/python3.9/site-packages/tensorflow/python/lib/io/file_io.py", line 77, in _preread_check self._read_buf = _pywrap_file_io.BufferedInputStream( tensorflow.python.framework.errors_impl.NotFoundError: model/params.json; No such file or directory

I am getting this error even though i have all the files from model checkpoint.data-00000-of-00001 checkpoint.index params.json in the model dir.