Open HLHsieh opened 1 month ago
Hi there,
I was trying to execute NASTRA on my own data as follows:
python nastra.py call -b C9ORF72.sorted.bam -o out
I got the error message:
[Processing]: 54it [00:01, 42.05it/s] Traceback (most recent call last): File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module> main() File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main args.func(args) File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 138, in calling_func sample_name, locus = key.split('_') ValueError: too many values to unpack (expected 2)
Any suggestions would be appreciated.
Best, Hsin
Hi Hsin, In NASTRA, we provided config files for STRs used for cell line authentication and forensic application, which is stored in https://github.com/renzilin/NASTRA/blob/main/NASTRA/cfgs/panel_forenseq.csv and https://github.com/renzilin/NASTRA/blob/main/NASTRA/cfgs/repeat_structure.pat.
You need to generate the locus information file before using NASTRA.
If you have any further question, please contact us.
Best, Zilin
Hi Zilin,
Thank you for your explanation. I looked into the arguments, and I am wondering whether PANEL, FACTSHEET, and CONFIG are required for my job.
Although I defined my own config, the same error message returns.
python $sc/nastra.py call -b C9ORF72_1_9R_NanoSim_2x.sorted.bam -o out -f repeat_structure.pat -p panel_forenseq.csv -c threshold.cfg
[Processing]: 1it [00:00, 38.61it/s]
Traceback (most recent call last):
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
main()
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
args.func(args)
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 138, in calling_func
sample_name, locus = key.split('_')
ValueError: too many values to unpack (expected 2)
repeat_structure.pat
Loci Chrom. Seq. Pattern Publication STRSeq BioProject
C9ORF72 9 [GGCCCC]n NA. NA
panel_forenseq.csv
STR,CHROM,START,END,LEN,PREFIX,SUFFIX
C9ORF72,chr9,27573529,27573546,6,GGGCCCGCCCCGACCACGCCCCG,TAGCGCGCGACTCCTGA
threshold.cfg
locus,cov_0,cov_10,cov_15,cov_20,cov_25,cov_30,cov_50
C9ORF72,0.31,0.3,0.35,0.36,0.375,0.345,0.39
Best, Hsin
Hi Zilin,
Thank you for your explanation. I looked into the arguments, and I am wondering whether PANEL, FACTSHEET, and CONFIG are required for my job.
Although I defined my own config, the same error message returns.
python $sc/nastra.py call -b C9ORF72_1_9R_NanoSim_2x.sorted.bam -o out -f repeat_structure.pat -p panel_forenseq.csv -c threshold.cfg
[Processing]: 1it [00:00, 38.61it/s] Traceback (most recent call last): File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module> main() File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main args.func(args) File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 138, in calling_func sample_name, locus = key.split('_') ValueError: too many values to unpack (expected 2)
repeat_structure.pat
Loci Chrom. Seq. Pattern Publication STRSeq BioProject C9ORF72 9 [GGCCCC]n NA. NA
panel_forenseq.csv
STR,CHROM,START,END,LEN,PREFIX,SUFFIX C9ORF72,chr9,27573529,27573546,6,GGGCCCGCCCCGACCACGCCCCG,TAGCGCGCGACTCCTGA
threshold.cfg
locus,cov_0,cov_10,cov_15,cov_20,cov_25,cov_30,cov_50 C9ORF72,0.31,0.3,0.35,0.36,0.375,0.345,0.39
Best, Hsin
Hi Hsin,
Would it be convenient for you to provide a sample file? If there is a file, I can test it quickly.
Best, Zilin
Hi Zilin,
Sure. I put all files under: https://www.dropbox.com/scl/fo/yjxy2wvtt7fxf4trkat2v/AAbemqYvKvHnO6Wd0svQA-w?rlkey=mz8947kt97b2lonbjos3lq3o5&dl=0 Please let me know if there is any problem.
Best, Hsin
@ElroyLR Please check the code. The files were already downloaded.
Hi Zilin,
Sure. I put all files under: https://www.dropbox.com/scl/fo/yjxy2wvtt7fxf4trkat2v/AAbemqYvKvHnO6Wd0svQA-w?rlkey=mz8947kt97b2lonbjos3lq3o5&dl=0 Please let me know if there is any problem.
Best, Hsin
Hi Hsin, Sorry for the late reply.
We found that rename the input bam file as 'barcode01.bam'. Cuz we originally want to build a pipeline directly for off-load data. We also have modified the configuration file. In detial, we used the human reference hg19 to conduct the whole genome alignment, as well as the locus position information.
As for the threshold for the determination of hom or het, you may define it according to your data.
The files are attached. pattern.txt panel.csv
Best, Zilin
Hi Zilin,
Thank you for your suggestions. The issue has been fixed, and I can execute NASTRA successfully. I have tried it on my 20 samples using the following command:
python $script call -b barcode01.bam -o ${myseq} -f repeat_structure.pat -p panel_forenseq.csv -c threshold.cfg --sncutoff 0
The threshold settings are:
locus,cov_0,cov_10,cov_15,cov_20,cov_25,cov_30,cov_50
C9ORF72,0.35,0.35,0.35,0.35,0.35,0.35,0.35
Only one sample analysis encountered this error:
Traceback (most recent call last):
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
main()
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
args.func(args)
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 119, in calling_func
merged_dat = pd.concat(results, axis=0)
File "/nfs/turbo/umms-kinfai/hsinlun/miniconda3/envs/nastra_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 372, in concat
op = _Concatenator(
File "/nfs/turbo/umms-kinfai/hsinlun/miniconda3/envs/nastra_env/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 429, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
I would appreciate any solution you could provide for this issue.
Best, Hsin-Lun
The result shows that ' No objects to concatenate', maybe the results list is empty
Hi Zilin,
In this case, could I consider that NASTRA was able to detect any reads related to this STR region? I have analyzed more samples and found several had some issues, but these samples should contain STR.
Besides, I encountered other issue as follows:
Traceback (most recent call last):
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module>
main()
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main
args.func(args)
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 99, in calling_func
cluster_alleles = cluster_func.cluster(counter_dct)
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 65, in cluster
allele_dct = self.allele_init(part_group)
File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 76, in allele_init
allele, supnum = part_group[0]
IndexError: list index out of range
Do you have any ideas what caused this issue and how to fix it?
Best, Hsin-Lun
Hi Zilin,
In this case, could I consider that NASTRA was able to detect any reads related to this STR region? I have analyzed more samples and found several had some issues, but these samples should contain STR.
Besides, I encountered other issue as follows:
Traceback (most recent call last): File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 154, in <module> main() File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 15, in main args.func(args) File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/nastra.py", line 99, in calling_func cluster_alleles = cluster_func.cluster(counter_dct) File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 65, in cluster allele_dct = self.allele_init(part_group) File "/nfs/turbo/umms-kinfai/hsinlun/bin/NASTRA/NASTRA/libs/pairwise_alignment.py", line 76, in allele_init allele, supnum = part_group[0] IndexError: list index out of range
Do you have any ideas what caused this issue and how to fix it?
Best, Hsin-Lun
How's the repeat structure in your reads, which contain STR? The part_group
could be empty. This indicates no cluster_alleles
Hi Zilin,
I tried several times, and the same issue occurred. The repeat structure is CC [GGCCCC]264 TAG. I checked, and there are five reads supporting this region. For some reasons, NASTRA did not consider these reads. Therefore, I guess the error might be derived from the assumption that no reads support this region.
Best, Hsin-Lun
I think the clustering step may make thie true reads is aligned to some wrong reads with the largest supporting number?
On Jul 12, 2024, at 13:45, HLHsieh @.***> wrote:
Hi Zilin,
I tried several times, and the same issue occurred. The repeat structure is CC [GGCCCC]264 TAG. I checked, and there are five reads supporting this region. For some reason, NASTRA did not consider these reads. Therefore, I guess the error was derived from the assumption that no reads support this region.
Best, Hsin-Lun
— Reply to this email directly, view it on GitHubhttps://github.com/renzilin/NASTRA/issues/2#issuecomment-2224760086, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AECXSIGILW75NQJ357VJM63ZL5UPHAVCNFSM6AAAAABJLCUEFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRUG43DAMBYGY. You are receiving this because you modified the open/close state.Message ID: @.***>
Hi there,
I was trying to execute NASTRA on my own data as follows:
I got the error message:
Any suggestions would be appreciated.
Best, Hsin