renzilin / NASTRA

Innovative Short Tandem Repeat Analysis through Cluster-Based Structure-Aware Algorithm in Nanopore Sequencing Data
GNU General Public License v3.0
4 stars 0 forks source link

ValueError: too many values to unpack (expected 2) #2

Open HLHsieh opened 1 month ago

HLHsieh commented 1 month ago

Hi there,

I was trying to execute NASTRA on my own data as follows:

python nastra.py call -b C9ORF72.sorted.bam  -o out

I got the error message:

[Processing]: 54it [00:01, 42.05it/s]
Traceback (most recent call last):
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
    main()
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
    args.func(args)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 138, in calling_func
    sample_name, locus = key.split('_')
ValueError: too many values to unpack (expected 2)

Any suggestions would be appreciated.

Best, Hsin

renzilin commented 1 month ago

Hi there,

I was trying to execute NASTRA on my own data as follows:

python nastra.py call -b C9ORF72.sorted.bam  -o out

I got the error message:

[Processing]: 54it [00:01, 42.05it/s]
Traceback (most recent call last):
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
    main()
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
    args.func(args)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 138, in calling_func
    sample_name, locus = key.split('_')
ValueError: too many values to unpack (expected 2)

Any suggestions would be appreciated.

Best, Hsin

Hi Hsin, In NASTRA, we provided config files for STRs used for cell line authentication and forensic application, which is stored in https://github.com/renzilin/NASTRA/blob/main/NASTRA/cfgs/panel_forenseq.csv and https://github.com/renzilin/NASTRA/blob/main/NASTRA/cfgs/repeat_structure.pat.

You need to generate the locus information file before using NASTRA.

If you have any further question, please contact us.

Best, Zilin

HLHsieh commented 1 month ago

Hi Zilin,

Thank you for your explanation. I looked into the arguments, and I am wondering whether PANEL, FACTSHEET, and CONFIG are required for my job.

Although I defined my own config, the same error message returns.

python $sc/nastra.py call -b C9ORF72_1_9R_NanoSim_2x.sorted.bam -o out -f repeat_structure.pat -p panel_forenseq.csv -c threshold.cfg
[Processing]: 1it [00:00, 38.61it/s]
Traceback (most recent call last):
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
    main()
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
    args.func(args)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 138, in calling_func
    sample_name, locus = key.split('_')
ValueError: too many values to unpack (expected 2)

repeat_structure.pat

Loci Chrom.  Seq. Pattern  Publication  STRSeq BioProject
C9ORF72  9  [GGCCCC]n  NA. NA

panel_forenseq.csv

STR,CHROM,START,END,LEN,PREFIX,SUFFIX
C9ORF72,chr9,27573529,27573546,6,GGGCCCGCCCCGACCACGCCCCG,TAGCGCGCGACTCCTGA

threshold.cfg

locus,cov_0,cov_10,cov_15,cov_20,cov_25,cov_30,cov_50
C9ORF72,0.31,0.3,0.35,0.36,0.375,0.345,0.39

Best, Hsin

renzilin commented 1 month ago

Hi Zilin,

Thank you for your explanation. I looked into the arguments, and I am wondering whether PANEL, FACTSHEET, and CONFIG are required for my job.

Although I defined my own config, the same error message returns.

python $sc/nastra.py call -b C9ORF72_1_9R_NanoSim_2x.sorted.bam -o out -f repeat_structure.pat -p panel_forenseq.csv -c threshold.cfg
[Processing]: 1it [00:00, 38.61it/s]
Traceback (most recent call last):
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
    main()
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
    args.func(args)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 138, in calling_func
    sample_name, locus = key.split('_')
ValueError: too many values to unpack (expected 2)

repeat_structure.pat

Loci Chrom.  Seq. Pattern  Publication  STRSeq BioProject
C9ORF72  9  [GGCCCC]n  NA. NA

panel_forenseq.csv

STR,CHROM,START,END,LEN,PREFIX,SUFFIX
C9ORF72,chr9,27573529,27573546,6,GGGCCCGCCCCGACCACGCCCCG,TAGCGCGCGACTCCTGA

threshold.cfg

locus,cov_0,cov_10,cov_15,cov_20,cov_25,cov_30,cov_50
C9ORF72,0.31,0.3,0.35,0.36,0.375,0.345,0.39

Best, Hsin

Hi Hsin,

Would it be convenient for you to provide a sample file? If there is a file, I can test it quickly.

Best, Zilin

HLHsieh commented 1 month ago

Hi Zilin,

Sure. I put all files under: https://www.dropbox.com/scl/fo/yjxy2wvtt7fxf4trkat2v/AAbemqYvKvHnO6Wd0svQA-w?rlkey=mz8947kt97b2lonbjos3lq3o5&dl=0 Please let me know if there is any problem.

Best, Hsin

renzilin commented 1 month ago

@ElroyLR Please check the code. The files were already downloaded.

renzilin commented 1 month ago

Hi Zilin,

Sure. I put all files under: https://www.dropbox.com/scl/fo/yjxy2wvtt7fxf4trkat2v/AAbemqYvKvHnO6Wd0svQA-w?rlkey=mz8947kt97b2lonbjos3lq3o5&dl=0 Please let me know if there is any problem.

Best, Hsin

Hi Hsin, Sorry for the late reply.

We found that rename the input bam file as 'barcode01.bam'. Cuz we originally want to build a pipeline directly for off-load data. We also have modified the configuration file. In detial, we used the human reference hg19 to conduct the whole genome alignment, as well as the locus position information.

As for the threshold for the determination of hom or het, you may define it according to your data.

The files are attached. pattern.txt panel.csv

Best, Zilin

HLHsieh commented 3 weeks ago

Hi Zilin,

Thank you for your suggestions. The issue has been fixed, and I can execute NASTRA successfully. I have tried it on my 20 samples using the following command:

python $script call -b barcode01.bam -o ${myseq} -f repeat_structure.pat -p panel_forenseq.csv -c threshold.cfg --sncutoff 0

The threshold settings are:

locus,cov_0,cov_10,cov_15,cov_20,cov_25,cov_30,cov_50
C9ORF72,0.35,0.35,0.35,0.35,0.35,0.35,0.35

Only one sample analysis encountered this error:

Traceback (most recent call last):
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
    main()
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
    args.func(args)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 119, in calling_func
    merged_dat = pd.concat(results, axis=0)
  File "/nfs/turbo/umms-kinfai/hsinlun/miniconda3/envs/nastra_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 372, in concat
    op = _Concatenator(
  File "/nfs/turbo/umms-kinfai/hsinlun/miniconda3/envs/nastra_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 429, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate

I would appreciate any solution you could provide for this issue.

Best, Hsin-Lun

renzilin commented 3 weeks ago

The result shows that ' No objects to concatenate', maybe the results list is empty

HLHsieh commented 3 weeks ago

Hi Zilin,

In this case, could I consider that NASTRA was able to detect any reads related to this STR region? I have analyzed more samples and found several had some issues, but these samples should contain STR.

Besides, I encountered other issue as follows:

Traceback (most recent call last):
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
    main()
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
    args.func(args)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 99, in calling_func
    cluster_alleles         = cluster_func.cluster(counter_dct)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 65, in cluster
    allele_dct = self.allele_init(part_group)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 76, in allele_init
    allele, supnum     = part_group[0]
IndexError: list index out of range

Do you have any ideas what caused this issue and how to fix it?

Best, Hsin-Lun

renzilin commented 2 weeks ago

Hi Zilin,

In this case, could I consider that NASTRA was able to detect any reads related to this STR region? I have analyzed more samples and found several had some issues, but these samples should contain STR.

Besides, I encountered other issue as follows:

Traceback (most recent call last):
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
    main()
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
    args.func(args)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 99, in calling_func
    cluster_alleles         = cluster_func.cluster(counter_dct)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 65, in cluster
    allele_dct = self.allele_init(part_group)
  File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 76, in allele_init
    allele, supnum     = part_group[0]
IndexError: list index out of range

Do you have any ideas what caused this issue and how to fix it?

Best, Hsin-Lun

How's the repeat structure in your reads, which contain STR? The part_group could be empty. This indicates no cluster_alleles

HLHsieh commented 2 weeks ago

Hi Zilin,

I tried several times, and the same issue occurred. The repeat structure is CC [GGCCCC]264 TAG. I checked, and there are five reads supporting this region. For some reasons, NASTRA did not consider these reads. Therefore, I guess the error might be derived from the assumption that no reads support this region.

Best, Hsin-Lun

renzilin commented 2 weeks ago

I think the clustering step may make thie true reads is aligned to some wrong reads with the largest supporting number?

On Jul 12, 2024, at 13:45, HLHsieh @.***> wrote:

Hi Zilin,

I tried several times, and the same issue occurred. The repeat structure is CC [GGCCCC]264 TAG. I checked, and there are five reads supporting this region. For some reason, NASTRA did not consider these reads. Therefore, I guess the error was derived from the assumption that no reads support this region.

Best, Hsin-Lun

— Reply to this email directly, view it on GitHubhttps://github.com/renzilin/NASTRA/issues/2#issuecomment-2224760086, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AECXSIGILW75NQJ357VJM63ZL5UPHAVCNFSM6AAAAABJLCUEFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRUG43DAMBYGY. You are receiving this because you modified the open/close state.Message ID: @.***>