Closed assetdaniyarov closed 7 months ago
setup.py - 'pandas>=1.1.0',
from setuptools import setup, find_packages
setup(
name='NanoPsiPy',
version='1.0',
packages=find_packages(),
include_package_data=True,
scripts=["bin/NanoPsiPy_estimation", "bin/NanoPsiPy_comparison"],
install_requires=[
'numpy>=1.24.0',
'pandas>=1.1.0',
],
license="GPL 3.0"
)
Hello, Which reference file have you used, Please use -d genome (if its gencode genome reference file) or -d transcriptome (if it is gencode transcriptome reference file). Have you tested NanoPsiPy_comparison with example data?
Thank you for your feedback, it worked for me.
Can you please tell me, after NanoPsiPyPy_comparison
command is it necessary to run merge_script.py
or chi_sqare.py
?
I didn't get a p-value in the results table - _NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AAP2.csv (a chunk is attached below)
Could you please provide me with some guidance?
ID,gene_id,havana_gene,havana_transcript,transcript_name,gene_name,ontology_id,RNA_feature,Direction,position,base_type,control_coverage,control_misC,control_C_reads,control_T_reads,treatment_coverage,treatment_misC,treatment_C_reads,treatment_T_reads
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1479,T,9.0,0.0,0.0,9.0,72.0,0.0,0.0,72.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1500,T,9.0,0.0,0.0,9.0,73.0,0.0410958904109589,3.0,70.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1505,T,9.0,0.0,0.0,9.0,73.0,0.0,0.0,73.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1515,T,9.0,0.0,0.0,9.0,73.0,0.0273972602739726,2.0,71.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1520,T,10.0,0.0,0.0,10.0,73.0,0.0136986301369863,1.0,72.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1521,T,10.0,0.0,0.0,10.0,73.0,0.0684931506849315,5.0,68.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1523,T,10.0,0.0,0.0,10.0,73.0,0.0958904109589041,7.0,66.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1527,T,11.0,0.0,0.0,11.0,73.0,0.0410958904109589,3.0,70.0
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \
-c /data/dRNA/NanoPsiPy/Control_P2/NanoPsiPy_estimation_Control_P2.csv \
-t /data/dRNA/NanoPsiPy/AA_P2/NanoPsiPy_estimation_AA_P2.csv \
-o /data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2 \
-d transcriptome
'control_misC' were not filtered as specified in the script - chi_square.py
import pandas as pd
from scipy.stats import chi2_contingency
def analyze(data):
# Load the data
data = pd.read_csv('/data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2/NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv')
# Drop rows where control_misC is less than 0.10
data = data[data['control_misC'] >= 0.10]
To compare these two samples, please run NanoPsiPy_comparison command. It (NanoPsiPycomparison) runs both the merge and chi square python scripts.
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: aset8 @.> Sent: Thursday, February 22, 2024 10:12:05 PM To: vetmohit89/NanoPsiPy @.> Cc: vetmohit89 @.>; Comment @.> Subject: Re: [vetmohit89/NanoPsiPy] utils.py, line 390, ValueError("Columns must be same length as key") (Issue #1)
Thank you for your feedback, it worked for me. Can you please tell me, after NanoPsiPyPy_comparison command is it necessary to run merge_script.pyhttps://urldefense.com/v3/__http://merge_script.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdyVEcu4$ or chi_sqare.pyhttps://urldefense.com/v3/__http://chi_sqare.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgaiz98Db$? I didn't get a p-value in the results table - NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv (a chunk is attached below) Could you please provide me with some guidance?
ID,gene_id,havana_gene,havana_transcript,transcript_name,gene_name,ontology_id,RNA_feature,Direction,position,base_type,control_coverage,control_misC,control_C_reads,control_T_reads,treatment_coverage,treatment_misC,treatment_C_reads,treatment_T_reads ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1479,T,9.0,0.0,0.0,9.0,72.0,0.0,0.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1500,T,9.0,0.0,0.0,9.0,73.0,0.0410958904109589,3.0,70.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1505,T,9.0,0.0,0.0,9.0,73.0,0.0,0.0,73.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1515,T,9.0,0.0,0.0,9.0,73.0,0.0273972602739726,2.0,71.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1520,T,10.0,0.0,0.0,10.0,73.0,0.0136986301369863,1.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1521,T,10.0,0.0,0.0,10.0,73.0,0.0684931506849315,5.0,68.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1523,T,10.0,0.0,0.0,10.0,73.0,0.0958904109589041,7.0,66.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1527,T,11.0,0.0,0.0,11.0,73.0,0.0410958904109589,3.0,70.0
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \ -c /data/dRNA/NanoPsiPy/Control_P2/NanoPsiPy_estimation_Control_P2.csv \ -t /data/dRNA/NanoPsiPy/AA_P2/NanoPsiPy_estimation_AA_P2.csv \ -o /data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2 \ -d transcriptome
'control_misC' were not filtered as specified in the script - chi_square.pyhttps://urldefense.com/v3/__http://chi_square.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgVWTLxUs$
import pandas as pd from scipy.stats import chi2_contingency
def analyze(data):
data = pd.read_csv('/data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2/NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv')
# Drop rows where control_misC is less than 0.10
data = data[data['control_misC'] >= 0.10]
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/vetmohit89/NanoPsiPy/issues/1*issuecomment-1960708407__;Iw!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdV5mGkb$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A2MGCS737MOFWC4T3RZA4HDYVAJJLAVCNFSM6AAAAABDUKRPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQG4YDQNBQG4__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgfGx1eBX$. You are receiving this because you commented.Message ID: @.***>
We have dropped anything less than 0.10 misC as noise in NaniPsiPy. I see your data is having misC values below 0.10 ( i.e. less than 10%) that could be the reason you don't see any p value.
Get Outlook for Androidhttps://aka.ms/AAb9ysg
From: aset8 @.> Sent: Thursday, February 22, 2024 10:12:05 PM To: vetmohit89/NanoPsiPy @.> Cc: vetmohit89 @.>; Comment @.> Subject: Re: [vetmohit89/NanoPsiPy] utils.py, line 390, ValueError("Columns must be same length as key") (Issue #1)
Thank you for your feedback, it worked for me. Can you please tell me, after NanoPsiPyPy_comparison command is it necessary to run merge_script.pyhttps://urldefense.com/v3/__http://merge_script.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdyVEcu4$ or chi_sqare.pyhttps://urldefense.com/v3/__http://chi_sqare.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgaiz98Db$? I didn't get a p-value in the results table - NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv (a chunk is attached below) Could you please provide me with some guidance?
ID,gene_id,havana_gene,havana_transcript,transcript_name,gene_name,ontology_id,RNA_feature,Direction,position,base_type,control_coverage,control_misC,control_C_reads,control_T_reads,treatment_coverage,treatment_misC,treatment_C_reads,treatment_T_reads ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1479,T,9.0,0.0,0.0,9.0,72.0,0.0,0.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1500,T,9.0,0.0,0.0,9.0,73.0,0.0410958904109589,3.0,70.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1505,T,9.0,0.0,0.0,9.0,73.0,0.0,0.0,73.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1515,T,9.0,0.0,0.0,9.0,73.0,0.0273972602739726,2.0,71.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1520,T,10.0,0.0,0.0,10.0,73.0,0.0136986301369863,1.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1521,T,10.0,0.0,0.0,10.0,73.0,0.0684931506849315,5.0,68.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1523,T,10.0,0.0,0.0,10.0,73.0,0.0958904109589041,7.0,66.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1527,T,11.0,0.0,0.0,11.0,73.0,0.0410958904109589,3.0,70.0
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \ -c /data/dRNA/NanoPsiPy/Control_P2/NanoPsiPy_estimation_Control_P2.csv \ -t /data/dRNA/NanoPsiPy/AA_P2/NanoPsiPy_estimation_AA_P2.csv \ -o /data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2 \ -d transcriptome
'control_misC' were not filtered as specified in the script - chi_square.pyhttps://urldefense.com/v3/__http://chi_square.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgVWTLxUs$
import pandas as pd from scipy.stats import chi2_contingency
def analyze(data):
data = pd.read_csv('/data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2/NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv')
# Drop rows where control_misC is less than 0.10
data = data[data['control_misC'] >= 0.10]
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/vetmohit89/NanoPsiPy/issues/1*issuecomment-1960708407__;Iw!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdV5mGkb$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A2MGCS737MOFWC4T3RZA4HDYVAJJLAVCNFSM6AAAAABDUKRPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQG4YDQNBQG4__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgfGx1eBX$. You are receiving this because you commented.Message ID: @.***>
Thank you very much.
The p-value is in control_vs_treatment_result.csv
, not in NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv
Hello,
I encountered an issue while running a script in NanoPsiPy for comparing control and treatment datasets using transcriptome data. Here are the details of the error:
Error Message:
Traceback (most recent call last):
File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 31, in <module>
merge_and_analyze(args.control_file, args.treatment_file, args.output_folder, args.data_type)
File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 14, in merge_and_analyze
analyze(merge_new)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/NanoPsiPy/chi_square.py", line 6, in analyze
data = pd.read_csv('merged.csv')
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
self._engine = self._make_engine(f, self.engine)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
self.handles = get_handle(
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/common.py", line 863, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'merged.csv'
Command:
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \
-c /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/GM/NanoPsiPy_estimation_GM.csv \
-t /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/AA-/NanoPsiPy_estimation_AA-.csv \
-o /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/NanoPsiPy_comparison_Control_GM_vs_AA- \
-d transcriptome
Estimating NanoPsi Values for Treatment (AA-)
input=/data/RNA_MDA-MB-231_1repeat_27062024/AA-/20240627_1437_4E_PAI53966_7a639891/fastq/
reference=/data/PublicData/rna_refseq/gencode.v45.transcripts.fa
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_estimation \
-i $input \
-r $reference \
-o /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/AA-/NanoPsiPy_estimation_AA-.csv \
-s treatment
Estimating NanoPsi Values for Control (GM)
input=/data/RNA_MDA-MB-231_1repeat_27062024/GM/20240627_1437_4B_PAI54106_ed2a8caa/fastq/
reference=/data/PublicData/rna_refseq/gencode.v45.transcripts.fa
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_estimation \
-i $input \
-r $reference \
-o /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/GM/NanoPsiPy_estimation_GM.csv \
-s control
Conda list:
# Name Version Build Channel
minimap2 2.18 h5bf99c6_0 https://anaconda.org/bioconda/minimap2/2.18/download
nanopsipy 1.0 pypi_0 pypi
numpy 1.24.0 py39h223a676_0 conda-forge
pandas 2.1.0 py39hddac248_0 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
python 3.9.7 h49503c6_0_cpython conda-forge
samtools 1.12 h9aed4be_1 bioconda
scipy 1.13.1 py39haf93ffa_0 conda-forge
Though it seems you have used same reference file for both the control and treatment. However to troublshoot it, would you mind if you please share small dataset from each control and treatment files. I will try to run it at my end.
Please test this pipline with TEST dataset. Please see your fastq files: read length is too short. I suggest you to run the pipeline with test dataset, if it works fine, then there is some issue with your DRS data.
Tried to run the program with test data. Unfortunately, the same error occurred. Is there any solution?
(nano) prom@PC48A067:/data/pipeline/NanoPsiPy/example/test$ /data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \
> -c /data/pipeline/NanoPsiPy/example/test/MALAT1_Wildtype_PUS7.csv \
> -t /data/pipeline/NanoPsiPy/example/test/MALAT1_Mutant_PUS7.csv \
> -o /data/pipeline/NanoPsiPy/example/test/NanoPsiPy_comparison_Mutant_vs_WildType \
> -d genome
Traceback (most recent call last):
File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 31, in <module>
merge_and_analyze(args.control_file, args.treatment_file, args.output_folder, args.data_type)
File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 14, in merge_and_analyze
analyze(merge_new)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/NanoPsiPy/chi_square.py", line 6, in analyze
data = pd.read_csv('merged.csv')
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
self._engine = self._make_engine(f, self.engine)
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
self.handles = get_handle(
File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/common.py", line 863, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'merged.csv'
Dear,
Thank you for testing this tool. I've tested it and found that everything is functioning correctly. It appears that updating your dependencies according to the specified requirements may resolve the issue. Please ensure your Python version is updated to 3.11.0. It seems there's an issue generating a merged file from control.csv and treatment.csv, which is likely to be resolved after updating the necessary dependencies.
If possible, could you please share a preview of the control.csv and treatment.csv files?
Thank you
They are different. Have you updated all the dependencies!
Could you tell me what the problem is?