nloyfer / UXM_deconv

Other
37 stars 12 forks source link

Invalid Input Argument error during deconvolution #10

Open boryanakis opened 1 year ago

boryanakis commented 1 year ago

I am trying to use uxm deconv with my own data after making sure the program works with the tutorial data. I get the following error:

uxm deconv tmc-110_WGBS.sorted.dedup.STRIPPED.pat.gz -o uxm_tmc-110_WGBS.STRIPPED2.csv --debug
wgbstools homog -f --rlen 4 -b /shared/home/bskoseva/src/UXM_deconv/tmp_dir/l4/tmc-110_WGBS.sorted.dedup.STRIPPED.79byf_1v.bed /analysis/cloud_projects/research/GL_COVID_TMC_and_Peds/uxm_deconvolution/tmc-110_WGBS.sorted.dedup.STRIPPED.pat.gz --prefix /shared/home/bskoseva/src/UXM_deconv/tmp_dir/l4/tmc-110_WGBS.sorted.dedup.STRIPPED.79byf_1v -v
Warning: skipping an empty sample tmc-110_WGBS.sorted.dedup.STRIPPED
Invalid input argument
Length of values (2) does not match length of index (36)

I looked at #1 to try and troubleshoot on my own but I am not seeing anything that helps me understand this error, or how to get past it. Here are the commands I used:

# set to the custom ref genome
$ wgbstools set_default_ref --name GRCh38 

Existing references:
=====
hg19
hg38
GRCh38 (default)

$ wgbstools bam2pat /analysis/projects/methyl/tmc-110_WGBS/tmc-110_WGBS.sorted.dedup.bam

# get the markers
$ tail -n +2 Atlas.U25.l4.hg38.full.tsv | cut -f1-5 > markers_Atlas.U25.l4.hg38.full.bed

# restrict to regions found in the atlas
$ wgbstools view -L markers_Atlas.U25.l4.hg38.full.bed tmc-110_WGBS.sorted.dedup.pat.gz --min_len 4 --strip --strict > tmc-110_WGBS.sorted.dedup.STRIPPED.pat

# check the content of the output
$ head tmc-110_WGBS.sorted.dedup.STRIPPED.pat
chr1    24569   CCCCCC  1
chr1    24569   CCCCCCCTCC  1
chr1    24569   CCCCCTTCC   1
chr1    24569   CCCT    1
chr1    24569   CCTCCCCCCCCCCC  1
chr1    24569   CCTCCCCTT   1
chr1    24569   TTTTTT  1
chr1    24572   CCCCCCC.C   1
chr1    24578   CCCCCC  1
chr1    63940   CCCCCC  1

# zip the pat file
$ gzip tmc-110_WGBS.sorted.dedup.STRIPPED.pat

# check the output of homog
$ wgbstools homog -b markers_Atlas.U25.l4.hg38.full.bed T-COV-R-110_WGBS.sorted.dedup.STRIPPED.pat.gz 
$ gunzip -c tmc-110_WGBS.sorted.dedup.STRIPPED.uxm.bed.gz | head 
chr1    1262136 1262432 24569   24584   1   1   7
chr1    2384160 2384745 63940   63960   0   0   25
chr1    5950648 5950918 133709  133715  0   0   10
chr1    5959258 5959335 133878  133884  0   0   8
chr1    7991117 7991683 173499  173512  3   2   15
chr1    9554214 9554463 199896  199905  0   0   12
chr1    10947269    10947539    226986  226992  1   0   9
chr1    11846130    11846567    242812  242825  0   0   14
chr1    14954541    14954609    282749  282754  0   1   14
chr1    20916695    20916823    376835  376841  0   0   8

How can I further troubleshoot?

AndriesDeKoker commented 1 year ago

tried bgzip instead of gzip?

welyt commented 1 year ago

same error

sabrina-liedtke commented 9 months ago

I am experiencing the same error, if anyone found a working solution, let me know please!

sabrina-liedtke commented 8 months ago

For whoever might come across the same issue: For me it was solved by excluding samples that had "all zeroes" error in the homog function as well as excluding samples with low read counts.