kcleal / dysgu

Toolkit for calling structural variants using short or long reads
MIT License
100 stars 12 forks source link

Multiprocessing error #44

Closed mc-er closed 2 years ago

mc-er commented 2 years ago

I'm trying to run dysgu with the multiprocessing flag (-p), some samples runs smoothly but once in a while i get errors like this one:

2022-07-20 18:39:33,269 [INFO   ]  [dysgu-run] Version: 1.3.11
2022-07-20 18:39:35,108 [INFO   ]  run -v1 -p4 -c --regions /home/mimmie/glob/dysgu/chr1_1.bed --regions-only True -I 393,95,151 parsed.pabies-2.0.fa P9904-117_temp dedup.ir.P9904-117.bam
2022-07-20 18:39:35,108 [INFO   ]  Destination: P9904-117_temp
2022-07-20 18:39:35,108 [INFO   ]  Searching regions from /home/mimmie/glob/dysgu/chr1_1.bed
2022-07-20 18:41:13,640 [INFO   ]  dysgu fetch dedup.ir.P9904-117.bam written to P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam, n=417399, time=0:01:38 h:m:s
2022-07-20 18:41:13,789 [INFO   ]  Input file is: P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-20 18:41:13,791 [INFO   ]  Input file has no index, but --include was provided, attempting to index
2022-07-20 18:41:14,439 [INFO   ]  Sample name: P9904-117
2022-07-20 18:41:14,439 [INFO   ]  Writing vcf to stdout
2022-07-20 18:41:14,439 [INFO   ]  Running pipeline
2022-07-20 18:41:14,441 [INFO   ]  Read length 151, insert_median 393, insert stdev 95
2022-07-20 18:41:14,441 [INFO   ]  Max clustering dist 868
2022-07-20 18:41:14,442 [INFO   ]  Minimum support 3
2022-07-20 18:41:14,442 [INFO   ]  Building graph with clustering 868 bp
2022-07-20 18:41:29,817 [INFO   ]  Total input reads 533293
2022-07-20 18:48:00,408 [INFO   ]  Graph constructed
Process Process-1:
Traceback (most recent call last):
  File "/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "dysgu/cluster.pyx", line 813, in dysgu.cluster.process_job
  File "dysgu/cluster.pyx", line 761, in dysgu.cluster.component_job
  File "dysgu/call_component.pyx", line 1972, in dysgu.call_component.call_from_block_model
  File "dysgu/call_component.pyx", line 1986, in dysgu.call_component.call_from_block_model
  File "dysgu/call_component.pyx", line 1933, in dysgu.call_component.multi
  File "dysgu/call_component.pyx", line 1822, in dysgu.call_component.get_reads
  File "pysam/libcalignmentfile.pyx", line 1876, in pysam.libcalignmentfile.AlignmentFile.__next__
OSError: error -4 while reading file

If i run the sample without multiprocessing, it runs through without any problems. Since I have over 1000 samples with massive genomes it's not an option to run without multiprocessing so I'm grateful for any ideas and suggestions to what could cause this error and how to fix it.

kcleal commented 2 years ago

Hi @mc-er, Thanks for reporting this. It is quite a confusing error. It appears that the temp file created from dysgu fetch might be mangled some how. The error -4 while reading file can be traced back to a htslib function int bam_read1(BGZF *fp, bam1_t *b) from https://github.com/samtools/htslib/blob/develop/sam.c It looks like -4 can be returned in a few situations, mostly when the bam record is mangled.

Im not sure if the problem is with the temp bam file, or somthing happening during the multiprocessing step of dysgu call - Would you be able to run a quick test for me to try and work out where the problem is? Could you try running: dysgu call --procs 1 using the temp file located at P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam ? If this fails then the problem is probably with the temp bam file. Thanks

mc-er commented 2 years ago

Hi @kcleal, Thanks for your quick response. I've done a run like you suggested, or I think I've done it in that way. It seems to run through fine. This is the output:

2022-07-21 16:53:57,331 [INFO   ]  [dysgu-fetch] Version: 1.3.11
2022-07-21 16:53:57,349 [INFO   ]  Searching regions from /home/mimmie/glob/dysgu/chr1_1.bed
2022-07-21 16:56:46,522 [INFO   ]  dysgu fetch dedup.ir.P9904-117.bam written to P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam, n=418562, time=0:02:49 h:m:s
2022-07-21 16:56:48,729 [INFO   ]  [dysgu-call] Version: 1.3.11
2022-07-21 16:56:48,729 [INFO   ]  Input file is: P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-21 16:56:48,730 [INFO   ]  call --procs 1 --overwrite parsed.pabies-2.0.fa P9904-117_temp P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-21 16:56:48,814 [INFO   ]  Sample name: P9904-117
2022-07-21 16:56:48,814 [INFO   ]  Writing vcf to stdout
2022-07-21 16:56:48,814 [INFO   ]  Running pipeline
2022-07-21 16:56:49,583 [INFO   ]  Calculating insert size. Removed 3000 outliers with insert size >= 1218
2022-07-21 16:56:49,593 [INFO   ]  Inferred read length 151.0, insert median 359, insert stdev 151
2022-07-21 16:56:49,595 [INFO   ]  Max clustering dist 1114
2022-07-21 16:56:49,595 [INFO   ]  Minimum support 3
2022-07-21 16:56:49,597 [INFO   ]  Building graph with clustering 1114 bp
2022-07-21 16:56:58,767 [INFO   ]  Total input reads 418562
2022-07-21 16:56:59,004 [INFO   ]  Graph constructed
2022-07-21 16:57:39,632 [INFO   ]  Number of components 258723. N candidates 38010
2022-07-21 16:57:42,691 [INFO   ]  Re-alignment of soft-clips done. N candidates 10878
2022-07-21 16:57:43,061 [INFO   ]  Number of candidate SVs merged: 1898
2022-07-21 16:57:43,061 [INFO   ]  Number of candidate SVs after merge: 8980
2022-07-21 16:57:43,063 [INFO   ]  Number of candidate SVs dropped with sv-len < min-size or support < min support: 4779
2022-07-21 16:57:43,117 [INFO   ]  Number or SVs near gaps dropped 0
2022-07-21 16:57:43,135 [INFO   ]  Loaded n=1 chromosome coverage arrays from P9904-117_temp
/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
2022-07-21 16:57:46,139 [INFO   ]  Model: pe, diploid: True, contig features: True. N features: 43
2022-07-21 16:57:47,932 [INFO   ]  dysgu call P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam complete, n=4201, time=0:00:59 h:m:s

I don't know if this matters, but one thing that might be worth mentioning is that when I installed dysgu the htslib git was broken and resulted in an empty folder so I manually added the latest version of htslib.

kcleal commented 2 years ago

Thanks @mc-er, this is very useful. This brings up a few questions - Does the OSError: error -4 while reading file appear everytime you run that sample with -p4, or does it occur only sometimes? Additionally, in the second example you provided, did you run dysgu fetch with --procs 4 as for the first dysgu run example?

It seems 'git clone --recursive' does not work for newer htslib versions. I will get the readme updated to reflect that, thanks.

mc-er commented 2 years ago

For your first questing, for files where the OSError: error -4 while reading fileerror occurs it occurs every time -p4 is used. The second question I didn't use -p4 when running dysgu fetch, I can quickly try a run doing that.

mc-er commented 2 years ago

It also run through fine when running dysgu fetch -p4

2022-07-22 08:19:30,755 [INFO   ]  [dysgu-fetch] Version: 1.3.11
2022-07-22 08:19:30,755 [INFO   ]  Searching regions from /home/mimmie/glob/dysgu/chr1_1.bed
2022-07-22 08:28:28,928 [INFO   ]  dysgu fetch dedup.ir.P9904-117.bam written to P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam, n=418562, time=0:08:58 h:m:s
2022-07-22 08:28:31,105 [INFO   ]  [dysgu-call] Version: 1.3.11
2022-07-22 08:28:31,105 [INFO   ]  Input file is: P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-22 08:28:31,105 [INFO   ]  call --procs 1 --overwrite parsed.pabies-2.0.fa P9904-117_temp P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-22 08:28:31,176 [INFO   ]  Sample name: P9904-117
2022-07-22 08:28:31,176 [INFO   ]  Writing vcf to stdout
2022-07-22 08:28:31,176 [INFO   ]  Running pipeline
2022-07-22 08:28:31,821 [INFO   ]  Calculating insert size. Removed 3000 outliers with insert size >= 1218
2022-07-22 08:28:31,828 [INFO   ]  Inferred read length 151.0, insert median 359, insert stdev 151
2022-07-22 08:28:31,829 [INFO   ]  Max clustering dist 1114
2022-07-22 08:28:31,829 [INFO   ]  Minimum support 3
2022-07-22 08:28:31,829 [INFO   ]  Building graph with clustering 1114 bp
2022-07-22 08:28:43,524 [INFO   ]  Total input reads 418562
2022-07-22 08:28:43,776 [INFO   ]  Graph constructed
2022-07-22 08:29:44,788 [INFO   ]  Number of components 258723. N candidates 38010
2022-07-22 08:29:48,568 [INFO   ]  Re-alignment of soft-clips done. N candidates 10878
2022-07-22 08:29:49,001 [INFO   ]  Number of candidate SVs merged: 1898
2022-07-22 08:29:49,001 [INFO   ]  Number of candidate SVs after merge: 8980
2022-07-22 08:29:49,004 [INFO   ]  Number of candidate SVs dropped with sv-len < min-size or support < min support: 4779
2022-07-22 08:29:49,046 [INFO   ]  Number or SVs near gaps dropped 0
2022-07-22 08:29:49,062 [INFO   ]  Loaded n=1 chromosome coverage arrays from P9904-117_temp
/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
2022-07-22 08:29:52,989 [INFO   ]  Model: pe, diploid: True, contig features: True. N features: 43
2022-07-22 08:29:55,037 [INFO   ]  dysgu call P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam complete, n=4201, time=0:01:23 h:m:s
kcleal commented 2 years ago

Thanks for running these tests. This might be a problem with dysgu, not htslib, im still a bit confused about the possible source at the moment. As the file does not look too big, would you mind sending me the dysgu_reads.bam file generated by dysgu fetch (via email clealk@cardiff.ac.uk)? This will make it much easier to find the issue, and also I can check if th error is reproducible (I will delete the file as soon as the issue is fixed). Is this a custom reference genome you are using also? Thanks for your input!


From: mc-er @.> Sent: Friday, July 22, 2022 8:34:41 AM To: kcleal/dysgu @.> Cc: Kez Cleal @.>; Mention @.> Subject: Re: [kcleal/dysgu] Multiprocessing error (Issue #44)

External email to Cardiff University - Take care when replying/opening attachments or links. Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.

It also run through fine when running dysgu fetch -p4

2022-07-22 08:19:30,755 [INFO ] [dysgu-fetch] Version: 1.3.11 2022-07-22 08:19:30,755 [INFO ] Searching regions from /home/mimmie/glob/dysgu/chr1_1.bed 2022-07-22 08:28:28,928 [INFO ] dysgu fetch dedup.ir.P9904-117.bam written to P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam, n=418562, time=0:08:58 h:m:s 2022-07-22 08:28:31,105 [INFO ] [dysgu-call] Version: 1.3.11 2022-07-22 08:28:31,105 [INFO ] Input file is: P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam 2022-07-22 08:28:31,105 [INFO ] call --procs 1 --overwrite parsed.pabies-2.0.fa P9904-117_temp P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam 2022-07-22 08:28:31,176 [INFO ] Sample name: P9904-117 2022-07-22 08:28:31,176 [INFO ] Writing vcf to stdout 2022-07-22 08:28:31,176 [INFO ] Running pipeline 2022-07-22 08:28:31,821 [INFO ] Calculating insert size. Removed 3000 outliers with insert size >= 1218 2022-07-22 08:28:31,828 [INFO ] Inferred read length 151.0, insert median 359, insert stdev 151 2022-07-22 08:28:31,829 [INFO ] Max clustering dist 1114 2022-07-22 08:28:31,829 [INFO ] Minimum support 3 2022-07-22 08:28:31,829 [INFO ] Building graph with clustering 1114 bp 2022-07-22 08:28:43,524 [INFO ] Total input reads 418562 2022-07-22 08:28:43,776 [INFO ] Graph constructed 2022-07-22 08:29:44,788 [INFO ] Number of components 258723. N candidates 38010 2022-07-22 08:29:48,568 [INFO ] Re-alignment of soft-clips done. N candidates 10878 2022-07-22 08:29:49,001 [INFO ] Number of candidate SVs merged: 1898 2022-07-22 08:29:49,001 [INFO ] Number of candidate SVs after merge: 8980 2022-07-22 08:29:49,004 [INFO ] Number of candidate SVs dropped with sv-len < min-size or support < min support: 4779 2022-07-22 08:29:49,046 [INFO ] Number or SVs near gaps dropped 0 2022-07-22 08:29:49,062 [INFO ] Loaded n=1 chromosome coverage arrays from P9904-117_temp /home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( 2022-07-22 08:29:52,989 [INFO ] Model: pe, diploid: True, contig features: True. N features: 43 2022-07-22 08:29:55,037 [INFO ] dysgu call P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam complete, n=4201, time=0:01:23 h:m:s

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkcleal%2Fdysgu%2Fissues%2F44%23issuecomment-1192275926&data=05%7C01%7Cclealk%40cardiff.ac.uk%7C41c9deed80d546fa79bf08da6bb4a499%7Cbdb74b3095684856bdbf06759778fcbc%7C1%7C0%7C637940720840722665%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=o9qMIDJu%2BILVo%2F9CQRZ1kHfmEfzCpSuLaD%2BOgbckWcM%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKIBQHIGH5FMTFPFCLVKGFDVVJFJDANCNFSM54GHRIQA&data=05%7C01%7Cclealk%40cardiff.ac.uk%7C41c9deed80d546fa79bf08da6bb4a499%7Cbdb74b3095684856bdbf06759778fcbc%7C1%7C0%7C637940720840722665%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UfFurh03JqRLYh2kqbPp2SKXqo8v53n3jhGadCjRwQU%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

mc-er commented 2 years ago

I've done another test run out of curiosity, I ran dysgu fetch --procs 4 followed by dysgu call --procs 4 using the dysgu_reads.bam file which ran through fine. So I ran dysgu run --procs 4 and the error appeared again.

#########################
# running dysgu fetch --procs 4
2022-07-22 11:27:35,847 [INFO   ]  [dysgu-fetch] Version: 1.3.11
2022-07-22 11:27:35,848 [INFO   ]  Searching regions from /home/mimmie/glob/dysgu/chr1_1.bed
2022-07-22 11:32:17,281 [INFO   ]  dysgu fetch dedup.ir.P9904-117.bam written to P9904-117_temp1/dedup.ir.P9904-117.dysgu_reads.bam, n=418562, time=0:04:41 h:m:s

#########################
# running dysgu call --procs 4
2022-07-22 11:32:19,362 [INFO   ]  [dysgu-call] Version: 1.3.11
2022-07-22 11:32:19,362 [INFO   ]  Input file is: P9904-117_temp1/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-22 11:32:19,362 [INFO   ]  call --procs 4 --overwrite parsed.pabies-2.0.fa P9904-117_temp1 P9904-117_temp1/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-22 11:32:19,479 [INFO   ]  Sample name: P9904-117
2022-07-22 11:32:19,479 [INFO   ]  Writing vcf to stdout
2022-07-22 11:32:19,479 [INFO   ]  Running pipeline
2022-07-22 11:32:19,983 [INFO   ]  Calculating insert size. Removed 3000 outliers with insert size >= 1218
2022-07-22 11:32:19,989 [INFO   ]  Inferred read length 151.0, insert median 359, insert stdev 151
2022-07-22 11:32:19,990 [INFO   ]  Max clustering dist 1114
2022-07-22 11:32:19,991 [INFO   ]  Minimum support 3
2022-07-22 11:32:19,991 [INFO   ]  Building graph with clustering 1114 bp
2022-07-22 11:32:31,183 [INFO   ]  Total input reads 418562
2022-07-22 11:32:31,418 [INFO   ]  Graph constructed
2022-07-22 11:32:48,114 [INFO   ]  Number of components 258723. N candidates 37990
2022-07-22 11:32:51,954 [INFO   ]  Re-alignment of soft-clips done. N candidates 10860
2022-07-22 11:32:52,383 [INFO   ]  Number of candidate SVs merged: 1901
2022-07-22 11:32:52,384 [INFO   ]  Number of candidate SVs after merge: 8959
2022-07-22 11:32:52,386 [INFO   ]  Number of candidate SVs dropped with sv-len < min-size or support < min support: 4753
2022-07-22 11:32:52,425 [INFO   ]  Number or SVs near gaps dropped 1
2022-07-22 11:32:52,439 [INFO   ]  Loaded n=1 chromosome coverage arrays from P9904-117_temp1
/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.1.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
2022-07-22 11:32:56,385 [INFO   ]  Model: pe, diploid: True, contig features: True. N features: 43
2022-07-22 11:32:58,515 [INFO   ]  dysgu call P9904-117_temp1/dedup.ir.P9904-117.dysgu_reads.bam complete, n=4205, time=0:00:39 h:m:s

#########################
# running dysgu run --procs 4
2022-07-22 11:33:00,461 [INFO   ]  [dysgu-run] Version: 1.3.11
2022-07-22 11:33:00,461 [INFO   ]  run --procs 4 -c --regions /home/mimmie/glob/dysgu/chr1_1.bed --regions-only True parsed.pabies-2.0.fa P9904-117_temp2 dedup.ir.P9904-117.bam
2022-07-22 11:33:00,461 [INFO   ]  Destination: P9904-117_temp2
2022-07-22 11:33:00,461 [INFO   ]  Searching regions from /home/mimmie/glob/dysgu/chr1_1.bed
2022-07-22 11:33:09,437 [INFO   ]  dysgu fetch dedup.ir.P9904-117.bam written to P9904-117_temp2/dedup.ir.P9904-117.dysgu_reads.bam, n=417399, time=0:00:08 h:m:s
2022-07-22 11:33:09,499 [INFO   ]  Input file is: P9904-117_temp2/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-22 11:33:09,500 [INFO   ]  Input file has no index, but --include was provided, attempting to index
2022-07-22 11:33:10,074 [INFO   ]  Sample name: P9904-117
2022-07-22 11:33:10,074 [INFO   ]  Writing vcf to stdout
2022-07-22 11:33:10,074 [INFO   ]  Running pipeline
2022-07-22 11:33:10,802 [INFO   ]  Calculating insert size. Removed 193 outliers with insert size >= 851.0
2022-07-22 11:33:10,818 [INFO   ]  Inferred read length 151.0, insert median 387, insert stdev 87
2022-07-22 11:33:10,819 [INFO   ]  Max clustering dist 822
2022-07-22 11:33:10,819 [INFO   ]  Minimum support 3
2022-07-22 11:33:10,819 [INFO   ]  Building graph with clustering 822 bp
2022-07-22 11:33:30,749 [INFO   ]  Total input reads 533293
2022-07-22 11:42:06,693 [INFO   ]  Graph constructed
Process Process-1:
Traceback (most recent call last):
  File "/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "dysgu/cluster.pyx", line 813, in dysgu.cluster.process_job
  File "dysgu/cluster.pyx", line 761, in dysgu.cluster.component_job
  File "dysgu/call_component.pyx", line 1972, in dysgu.call_component.call_from_block_model
  File "dysgu/call_component.pyx", line 1986, in dysgu.call_component.call_from_block_model
  File "dysgu/call_component.pyx", line 1933, in dysgu.call_component.multi
  File "dysgu/call_component.pyx", line 1822, in dysgu.call_component.get_reads
  File "pysam/libcalignmentfile.pyx", line 1876, in pysam.libcalignmentfile.AlignmentFile.__next__
OSError: error -4 while reading file

Of course I can send you the file. Happy to help to sort out the error! Given the new information would you want one that is created during dysgu run?

kcleal commented 2 years ago

Great, that is quite revealing. It looks like the error is coming from the --regions / --regions-only True options. I think I have enough to try and find the error now. In the mean time, I can recommend running dysgu as a two step pipeline (avoid using dysgu run) as you have above. So use dysgu fetch -p4 --search ... followed by dysgu call .... This should give equivalent results of the run pipeline anyway

mc-er commented 2 years ago

A side note. I started a run as you suggested and got this Exception: --regions-only is not currently supported via the 'call' command error

kcleal commented 2 years ago

I can see some issues with how the --regions-only True option is treated - I think these have arisen with more recent versions of dysgu, my apologies. Since you have run dysgu fetch --search this hard filters the genome for the target regions so --regions-only will be redundant in this case, and will not be needed for the subsequent dysgu call command.

kcleal commented 2 years ago

Further to my last message, I think the subsequent dysgu call the --ibam option should also be filled out; this will flag that the current input bam file was derived from dysgu fetch, as well as providing a better sample for insert size calculation.

kcleal commented 2 years ago

I believe the error has now been resolved. You will have to use git clone and build dysgu from source whilst I get a release uploaded to pypi. There was a problem with the pipeline when using --regions-only True. This option is now only supported when using dysgu call, and it now behaves as expected i.e. when '--regions-only True' is set, the input regions are scanned and mate-pair information is used to try and add additional regions to search. I have added a warning in the readme that this can lead to high memory usage if the input regions are large. Hope this resolved things for you.

mc-er commented 2 years ago

Fantastic! Thank you very much for all the help!

kcleal commented 2 years ago

I will close this for now, but please re-open if any issues arise