Open yashi99 opened 8 months ago
I had a similar error reported once, it was due to a pandas version update, you can test it first with python /home/cisDynet_snakemake/atacqc/cal_tss_old.py cut_sites/hESC_rep1_q30_cut_sites.bed tss/TSS.pos tss/hESC_rep1_tss.csv tss/hESC_rep1_score.csv alone.
By the way, you'd better re-download atacqc as I updated some of the scripts today.
Thank you! I tried using the cal_tss_old.py but it returned the same error:
python /home/cisDynet_snakemake/atacqc/cal_tss_old.py cut_sites/hESC_rep1_q30_cut_sites.bed tss/TSS.pos test/hESC_rep1_tss.csv test/hESC_rep1_score.csv
join: Strand data from other will be added as strand data to self.
If this is undesired use the flag apply_strand_suffix=False.
To turn off the warning set apply_strand_suffix to True or False.
Traceback (most recent call last):
File "/home/cisDynet_snakemake/atacqc/cal_tss_old.py", line 17, in <module>
dis_fliter_for=dis[dis['Strand']=='+']
File "/home/systools/anaconda3/envs/cisDynet_pipeline/lib/python3.8/site-packages/pandas/core/frame.py", line 3761, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/systools/anaconda3/envs/cisDynet_pipeline/lib/python3.8/site-packages/pandas/core/indexes/range.py", line 349, in get_loc
raise KeyError(key)
KeyError: 'Strand'
And then I tried the cal_tss_old.py and cal_tss.py from the re-downloaded atacqc. They did not work with the same errors. Then I just run the re-downloaded cisDynet_snakemake.py in a new folder, and got the error:
snakemake -s ../cisDynet_snakemake2/atacqc/cisDynet_snakemake.py -j 15
SyntaxError in line 75 of /home/cisDynet_snakemake2/atacqc/cisDynet_snakemake.py:
invalid syntax
Maybe I should first solve the problem of cal_tss.py or cal_tss_old.py?
Can you print a few lines from the TSS.pos file in the tss directory (created by this pipeline)? Also, I just updated cisDynet_snakemake.py.
Of course!
(cisDynet_pipeline) head -n 50 TSS.pos J01415.2 577 577 1 + gene J01415.2 648 648 1 + gene J01415.2 1602 1602 1 + gene J01415.2 1671 1671 1 + gene J01415.2 3230 3230 1 + gene J01415.2 3307 3307 1 + gene J01415.2 4263 4263 1 + gene J01415.2 4400 4400 1 - gene J01415.2 4402 4402 1 + gene J01415.2 4470 4470 1 + gene J01415.2 5512 5512 1 + gene J01415.2 5655 5655 1 - gene J01415.2 5729 5729 1 - gene J01415.2 5826 5826 1 - gene J01415.2 5891 5891 1 - gene J01415.2 5904 5904 1 + gene J01415.2 7514 7514 1 - gene J01415.2 7518 7518 1 + gene J01415.2 7586 7586 1 + gene J01415.2 8295 8295 1 + gene J01415.2 8366 8366 1 + gene J01415.2 8527 8527 1 + gene J01415.2 9207 9207 1 + gene J01415.2 9991 9991 1 + gene J01415.2 10059 10059 1 + gene
Can you run the following code and then take a screenshot to show me the output?
import pyranges as pr
import pandas as pd
tss_file=pd.read_table('tss/TSS.pos',usecols=[0,1,2,4],skiprows=1,names='Chromosome Start End Strand'.split())
tss_gr=pr.PyRanges(tss_file)
cut=pd.read_table('cut_sites/hESC_rep1_q30_cut_sites.bed', usecols=[0,1,2], names='Chromosome Start End'.split())
cut_gr=pr.PyRanges(cut)
dis=cut_gr.nearest(tss_gr).as_df()
dis.head()
Yes, and here are the results:
>>> import pyranges as pr
>>> import pandas as pd
>>> tss_file=pd.read_table('tss/TSS.pos',usecols=[0,1,2,4],skiprows=1,names='Chromosome Start End Strand'.split())
>>> tss_gr=pr.PyRanges(tss_file)
>>> cut=pd.read_table('cut_sites/hESC_rep1_q30_cut_sites.bed', usecols=[0,1,2], names='Chromosome Start End'.split())
>>> cut_gr=pr.PyRanges(cut)
>>> dis=cut_gr.nearest(tss_gr).as_df()
join: Strand data from other will be added as strand data to self.
If this is undesired use the flag apply_strand_suffix=False.
To turn off the warning set apply_strand_suffix to True or False.
>>> dis.head()
Empty DataFrame
Columns: []
Index: []
>>>
Can you show a few lines of your cut_sites/hESC_rep1_q30_cut_sites.bed?
Yes:
(cisDynet_pipeline) head cut_sites/hESC_rep1_q30_cut_sites.bed
chr1 10007 10008 1
chr1 10124 10125 1
chr1 10030 10031 1
chr1 10047 10048 1
chr1 10113 10114 1
chr1 10049 10050 1
chr1 10138 10139 1
chr1 10138 10139 1
chr1 10138 10139 1
chr1 17207 17208 1
And here are some head lines of each file
>>> tss_gr
+--------------+-----------+-----------+--------------+
| Chromosome | Start | End | Strand |
| (category) | (int64) | (int64) | (category) |
|--------------+-----------+-----------+--------------|
| J01415.2 | 648 | 648 | + |
| J01415.2 | 1602 | 1602 | + |
| J01415.2 | 1671 | 1671 | + |
| J01415.2 | 3230 | 3230 | + |
| ... | ... | ... | ... |
| J01415.2 | 7514 | 7514 | - |
| J01415.2 | 14673 | 14673 | - |
| J01415.2 | 14742 | 14742 | - |
| J01415.2 | 16023 | 16023 | - |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 36 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> cut
Chromosome Start End
0 chr1 10007 10008
1 chr1 10124 10125
2 chr1 10030 10031
3 chr1 10047 10048
4 chr1 10113 10114
... ... ... ...
18638934 chr19_GL949752v1_alt 204499 204500
18638935 chr19_GL949752v1_alt 204595 204596
18638936 chr19_GL949753v2_alt 257133 257134
18638937 chr19_GL949753v2_alt 257164 257165
18638938 chr19_KI270938v1_alt 332376 332377
[18638939 rows x 3 columns]
>>> dis
Empty DataFrame
Columns: []
Index: []
>>> cut_gr.nearest(tss_gr)
join: Strand data from other will be added as strand data to self.
If this is undesired use the flag apply_strand_suffix=False.
To turn off the warning set apply_strand_suffix to True or False.
Empty PyRanges
>>> cut_gr
+------------------------+-----------+-----------+
| Chromosome | Start | End |
| (category) | (int64) | (int64) |
|------------------------+-----------+-----------|
| chr1 | 10007 | 10008 |
| chr1 | 10124 | 10125 |
| chr1 | 10030 | 10031 |
| chr1 | 10047 | 10048 |
| ... | ... | ... |
| chrY_KI270740v1_random | 18205 | 18206 |
| chrY_KI270740v1_random | 18419 | 18420 |
| chrY_KI270740v1_random | 20526 | 20527 |
| chrY_KI270740v1_random | 20711 | 20712 |
+------------------------+-----------+-----------+
Unstranded PyRanges object has 18,638,939 rows and 3 columns from 296 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
Is J01415.2 a chromosome number?发自我的 iPhone在 2023年10月27日,14:12,yashi99 @.***> 写道: Yes: (cisDynet_pipeline) head cut_sites/hESC_rep1_q30_cut_sites.bed chr1 10007 10008 1 chr1 10124 10125 1 chr1 10030 10031 1 chr1 10047 10048 1 chr1 10113 10114 1 chr1 10049 10050 1 chr1 10138 10139 1 chr1 10138 10139 1 chr1 10138 10139 1 chr1 17207 17208 1
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
No, I am not sure why it shows JJ01415.2 in TSS.pos,so I search "J01415.2" in the .fna and .gff recorded in config.yaml:
(base) cat /home/data/genome/BWA_RNA_UCSCfa_hg38_index/GCA_000001405.15_GRCh38_full_analysis_set.fna | grep J01415
>chrM AC:J01415.2 gi:113200490 LN:16569 rl:Mitochondrion M5:c68f52674c9fb33aef52dcf399755519 AS:GRCh38 tp:circular
(base) cat /home/data/genome/BWA_RNA_UCSCfa_hg38_index/GCA_000001405.15_GRCh38_genomic.gff | grep J01415.2 | head
##sequence-region J01415.2 1 16569
J01415.2 Genbank region 1 16569 . + . ID=id479;Dbxref=taxon:9606;Is_circular=true;Name=MT;country=United Kingdom: Great Britain;gbkey=Src;genome=mitochondrion;isolation-source=caucasian;mol_type=genomic DNA;note=this is the rCRS;tissue-type=placenta
J01415.2 Genbank D_loop 16024 17145 . - . ID=id480;gbkey=D-loop
J01415.2 Genbank gene 577 647 . + . ID=gene0;Dbxref=GeneID:4558;Name=TRNF;gbkey=Gene;gene=TRNF;gene_synonym=MTTF
J01415.2 Genbank tRNA 577 647 . + . ID=rna0;Parent=gene0;Note=NAR: 1455;anticodon=(pos:611..613);codons=1;gbkey=tRNA;gene=TRNF;product=tRNA-Phe
J01415.2 Genbank exon 577 647 . + . ID=id481;Parent=rna0;Note=NAR: 1455;anticodon=(pos:611..613);codons=1;gbkey=tRNA;gene=TRNF;product=tRNA-Phe
J01415.2 Genbank gene 648 1601 . + . ID=gene1;Dbxref=GeneID:4549;Name=RNR1;gbkey=Gene;gene=RNR1;gene_synonym=MTRNR1
J01415.2 Genbank rRNA 648 1601 . + . ID=rna1;Parent=gene1;gbkey=rRNA;gene=RNR1;product=12S ribosomal RNA
J01415.2 Genbank exon 648 1601 . + . ID=id482;Parent=rna1;gbkey=rRNA;gene=RNR1;product=12S ribosomal RNA
J01415.2 Genbank gene 1602 1670 . + . ID=gene2;Dbxref=GeneID:4577;Name=TRNV;gbkey=Gene;gene=TRNV;gene_synonym=MTTV
It seems that those fna and gff downloaded from UCSC are questionable, so I will firstly turn to use my usual fa and gtf to build bwa index to run snakemake. Also I notice that in this .fna,J01415.2 seems to be another name of chrM?Is it because I set "organell_chr: chrM" in my config.yaml? I am still not sure what this parameter means. Is it means "only retain chrM results" or "only discard chrM results"?
Please send me part of the data (10000 rows of cut_sites/hESC_rep1_q30_cut_sites.bed ) and your gff file to my email (tzhubio@gmail.com) and I'll take a look at it specifically.
I have sent you the questionable files, and then I tried to run snakemake using my usual .fa and .gtf as mentioned before, and then I successfully got the results! Thank you so much!
Hi, when running
snakemake -s ./cisDynet_snakemake/atacqc/cisDynet_snakemake.py -j 15
,I find this error :I think there may be some problem during
python /home/cisDynet_snakemake/atacqc/cal_tss.py cut_sites/hESC_rep1_q30_cut_sites.bed tss/TSS.pos tss/hESC_rep1_tss.csv tss/hESC_rep1_score.csv
. when I tried to run it separately, it report error as below:Could you please give some advice?