vetmohit89 / NanoPsiPy

GNU General Public License v3.0
3 stars 0 forks source link

utils.py, line 390, ValueError("Columns must be same length as key") #1

Closed assetdaniyarov closed 7 months ago

assetdaniyarov commented 7 months ago

Could you tell me what the problem is?

/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \
-c /data/dRNA/nanopsu/NanoPsiPy_Control_P2/NanoPsiPy_estimation_Control_P2.csv \
-t /data/dRNA/nanopsu/NanoPsiPy_AA_P2/NanoPsiPy_estimation_AA_P2.csv \
-o /data/dRNA/nanopsu/NanoPsiPy_comparison_Control_P2_vs_AA_P2 \
-d transcriptome
Traceback (most recent call last):
  File "/data/adaniyarov/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 31, in <module>
    merge_and_analyze(args.control_file, args.treatment_file, args.output_folder, args.data_type)
  File "/data/adaniyarov/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 11, in merge_and_analyze
    merge_new = merge_csvs(control_file, treatment_file, output_folder, data_type)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3_2023/lib/python3.11/site-packages/NanoPsiPy/merge_script.py", line 30, in merge_csvs
    control[columns_to_split] = control["ID"].str.split(",", expand=True)
    ~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3_2023/lib/python3.11/site-packages/pandas/core/frame.py", line 4082, in __setitem__
    self._setitem_array(key, value)
  File "/data/miniconda3_2023/lib/python3.11/site-packages/pandas/core/frame.py", line 4124, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/data/miniconda3_2023/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
conda list | grep 
minimap2                  2.18                 h5bf99c6_0    bioconda
numpy                     1.24.0                   pypi_0    pypi
pandas                    2.1.0           py311h320fe9a_0    conda-forge
pickle4                   0.0.1                    pypi_0    pypi
python                    3.11.0          he550d4f_1_cpython    conda-forge
assetdaniyarov commented 7 months ago

setup.py - 'pandas>=1.1.0',

from setuptools import setup, find_packages

setup(
    name='NanoPsiPy',
    version='1.0',
    packages=find_packages(),
    include_package_data=True,
    scripts=["bin/NanoPsiPy_estimation", "bin/NanoPsiPy_comparison"],
    install_requires=[
        'numpy>=1.24.0',
        'pandas>=1.1.0',
        ],
    license="GPL 3.0"
)
vetmohit89 commented 7 months ago

Hello, Which reference file have you used, Please use -d genome (if its gencode genome reference file) or -d transcriptome (if it is gencode transcriptome reference file). Have you tested NanoPsiPy_comparison with example data?

assetdaniyarov commented 7 months ago

Thank you for your feedback, it worked for me. Can you please tell me, after NanoPsiPyPy_comparison command is it necessary to run merge_script.py or chi_sqare.py? I didn't get a p-value in the results table - _NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AAP2.csv (a chunk is attached below) Could you please provide me with some guidance?

ID,gene_id,havana_gene,havana_transcript,transcript_name,gene_name,ontology_id,RNA_feature,Direction,position,base_type,control_coverage,control_misC,control_C_reads,control_T_reads,treatment_coverage,treatment_misC,treatment_C_reads,treatment_T_reads
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1479,T,9.0,0.0,0.0,9.0,72.0,0.0,0.0,72.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1500,T,9.0,0.0,0.0,9.0,73.0,0.0410958904109589,3.0,70.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1505,T,9.0,0.0,0.0,9.0,73.0,0.0,0.0,73.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1515,T,9.0,0.0,0.0,9.0,73.0,0.0273972602739726,2.0,71.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1520,T,10.0,0.0,0.0,10.0,73.0,0.0136986301369863,1.0,72.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1521,T,10.0,0.0,0.0,10.0,73.0,0.0684931506849315,5.0,68.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1523,T,10.0,0.0,0.0,10.0,73.0,0.0958904109589041,7.0,66.0
ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1527,T,11.0,0.0,0.0,11.0,73.0,0.0410958904109589,3.0,70.0
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \
-c /data/dRNA/NanoPsiPy/Control_P2/NanoPsiPy_estimation_Control_P2.csv \
-t /data/dRNA/NanoPsiPy/AA_P2/NanoPsiPy_estimation_AA_P2.csv \
-o /data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2 \
-d transcriptome

'control_misC' were not filtered as specified in the script - chi_square.py

import pandas as pd
from scipy.stats import chi2_contingency

def analyze(data):
    # Load the data
    data = pd.read_csv('/data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2/NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv')

    # Drop rows where control_misC is less than 0.10
    data = data[data['control_misC'] >= 0.10]
vetmohit89 commented 7 months ago

To compare these two samples, please run NanoPsiPy_comparison command. It (NanoPsiPycomparison) runs both the merge and chi square python scripts.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: aset8 @.> Sent: Thursday, February 22, 2024 10:12:05 PM To: vetmohit89/NanoPsiPy @.> Cc: vetmohit89 @.>; Comment @.> Subject: Re: [vetmohit89/NanoPsiPy] utils.py, line 390, ValueError("Columns must be same length as key") (Issue #1)

Thank you for your feedback, it worked for me. Can you please tell me, after NanoPsiPyPy_comparison command is it necessary to run merge_script.pyhttps://urldefense.com/v3/__http://merge_script.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdyVEcu4$ or chi_sqare.pyhttps://urldefense.com/v3/__http://chi_sqare.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgaiz98Db$? I didn't get a p-value in the results table - NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv (a chunk is attached below) Could you please provide me with some guidance?

ID,gene_id,havana_gene,havana_transcript,transcript_name,gene_name,ontology_id,RNA_feature,Direction,position,base_type,control_coverage,control_misC,control_C_reads,control_T_reads,treatment_coverage,treatment_misC,treatment_C_reads,treatment_T_reads ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1479,T,9.0,0.0,0.0,9.0,72.0,0.0,0.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1500,T,9.0,0.0,0.0,9.0,73.0,0.0410958904109589,3.0,70.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1505,T,9.0,0.0,0.0,9.0,73.0,0.0,0.0,73.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1515,T,9.0,0.0,0.0,9.0,73.0,0.0273972602739726,2.0,71.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1520,T,10.0,0.0,0.0,10.0,73.0,0.0136986301369863,1.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1521,T,10.0,0.0,0.0,10.0,73.0,0.0684931506849315,5.0,68.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1523,T,10.0,0.0,0.0,10.0,73.0,0.0958904109589041,7.0,66.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1527,T,11.0,0.0,0.0,11.0,73.0,0.0410958904109589,3.0,70.0

/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \ -c /data/dRNA/NanoPsiPy/Control_P2/NanoPsiPy_estimation_Control_P2.csv \ -t /data/dRNA/NanoPsiPy/AA_P2/NanoPsiPy_estimation_AA_P2.csv \ -o /data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2 \ -d transcriptome

'control_misC' were not filtered as specified in the script - chi_square.pyhttps://urldefense.com/v3/__http://chi_square.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgVWTLxUs$

import pandas as pd from scipy.stats import chi2_contingency

def analyze(data):

Load the data

data = pd.read_csv('/data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2/NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv')

# Drop rows where control_misC is less than 0.10
data = data[data['control_misC'] >= 0.10]

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/vetmohit89/NanoPsiPy/issues/1*issuecomment-1960708407__;Iw!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdV5mGkb$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A2MGCS737MOFWC4T3RZA4HDYVAJJLAVCNFSM6AAAAABDUKRPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQG4YDQNBQG4__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgfGx1eBX$. You are receiving this because you commented.Message ID: @.***>

vetmohit89 commented 7 months ago

We have dropped anything less than 0.10 misC as noise in NaniPsiPy. I see your data is having misC values below 0.10 ( i.e. less than 10%) that could be the reason you don't see any p value.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: aset8 @.> Sent: Thursday, February 22, 2024 10:12:05 PM To: vetmohit89/NanoPsiPy @.> Cc: vetmohit89 @.>; Comment @.> Subject: Re: [vetmohit89/NanoPsiPy] utils.py, line 390, ValueError("Columns must be same length as key") (Issue #1)

Thank you for your feedback, it worked for me. Can you please tell me, after NanoPsiPyPy_comparison command is it necessary to run merge_script.pyhttps://urldefense.com/v3/__http://merge_script.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdyVEcu4$ or chi_sqare.pyhttps://urldefense.com/v3/__http://chi_sqare.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgaiz98Db$? I didn't get a p-value in the results table - NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv (a chunk is attached below) Could you please provide me with some guidance?

ID,gene_id,havana_gene,havana_transcript,transcript_name,gene_name,ontology_id,RNA_feature,Direction,position,base_type,control_coverage,control_misC,control_C_reads,control_T_reads,treatment_coverage,treatment_misC,treatment_C_reads,treatment_T_reads ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1479,T,9.0,0.0,0.0,9.0,72.0,0.0,0.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1500,T,9.0,0.0,0.0,9.0,73.0,0.0410958904109589,3.0,70.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1505,T,9.0,0.0,0.0,9.0,73.0,0.0,0.0,73.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1515,T,9.0,0.0,0.0,9.0,73.0,0.0273972602739726,2.0,71.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1520,T,10.0,0.0,0.0,10.0,73.0,0.0136986301369863,1.0,72.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1521,T,10.0,0.0,0.0,10.0,73.0,0.0684931506849315,5.0,68.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1523,T,10.0,0.0,0.0,10.0,73.0,0.0958904109589041,7.0,66.0 ENST00000327044.7,ENSG00000188976.11,OTTHUMG00000040720.2,OTTHUMT00000097869.2,NOC2L-201,NOC2L,2757,protein_coding,_F,1527,T,11.0,0.0,0.0,11.0,73.0,0.0410958904109589,3.0,70.0

/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \ -c /data/dRNA/NanoPsiPy/Control_P2/NanoPsiPy_estimation_Control_P2.csv \ -t /data/dRNA/NanoPsiPy/AA_P2/NanoPsiPy_estimation_AA_P2.csv \ -o /data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2 \ -d transcriptome

'control_misC' were not filtered as specified in the script - chi_square.pyhttps://urldefense.com/v3/__http://chi_square.py__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgVWTLxUs$

import pandas as pd from scipy.stats import chi2_contingency

def analyze(data):

Load the data

data = pd.read_csv('/data/dRNA/NanoPsiPy/NanoPsiPy_comparison_Control_P2_vs_AA_P2/NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv')

# Drop rows where control_misC is less than 0.10
data = data[data['control_misC'] >= 0.10]

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/vetmohit89/NanoPsiPy/issues/1*issuecomment-1960708407__;Iw!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgdV5mGkb$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A2MGCS737MOFWC4T3RZA4HDYVAJJLAVCNFSM6AAAAABDUKRPQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQG4YDQNBQG4__;!!NoSwA-eRAg!CJIjigOnZvPErQECu5wyUw8Y2kp6kBx0oDY-UGQVzRkFAl3UG6hDJb0G0DJJgVsx1ZJ0Vj9PLtJnyolxgfGx1eBX$. You are receiving this because you commented.Message ID: @.***>

assetdaniyarov commented 7 months ago

Thank you very much. The p-value is in control_vs_treatment_result.csv, not in NanoPsiPy_estimation_Control_P2_vs_NanoPsiPy_estimation_AA_P2.csv

assetdaniyarov commented 3 months ago

Hello,

I encountered an issue while running a script in NanoPsiPy for comparing control and treatment datasets using transcriptome data. Here are the details of the error:

Error Message:

Traceback (most recent call last):
  File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 31, in <module>
    merge_and_analyze(args.control_file, args.treatment_file, args.output_folder, args.data_type)
  File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 14, in merge_and_analyze
    analyze(merge_new)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/NanoPsiPy/chi_square.py", line 6, in analyze
    data = pd.read_csv('merged.csv')
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
    self.handles = get_handle(
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/common.py", line 863, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'merged.csv'

Command:

/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \
-c /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/GM/NanoPsiPy_estimation_GM.csv \
-t /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/AA-/NanoPsiPy_estimation_AA-.csv \
-o /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/NanoPsiPy_comparison_Control_GM_vs_AA- \
-d transcriptome

Estimating NanoPsi Values for Treatment (AA-)

input=/data/RNA_MDA-MB-231_1repeat_27062024/AA-/20240627_1437_4E_PAI53966_7a639891/fastq/
reference=/data/PublicData/rna_refseq/gencode.v45.transcripts.fa
/data/pipeline/NanoPsiPy/bin/NanoPsiPy_estimation \
-i $input \
-r $reference \
-o /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/AA-/NanoPsiPy_estimation_AA-.csv \
-s treatment

Estimating NanoPsi Values for Control (GM)

input=/data/RNA_MDA-MB-231_1repeat_27062024/GM/20240627_1437_4B_PAI54106_ed2a8caa/fastq/
reference=/data/PublicData/rna_refseq/gencode.v45.transcripts.fa

/data/pipeline/NanoPsiPy/bin/NanoPsiPy_estimation \
-i $input \
-r $reference \
-o /data/dRNA/RNA_MDA-MB-231_1repeat_27062024/NanoPsiPy/GM/NanoPsiPy_estimation_GM.csv \
-s control

Conda list:

# Name                    Version                   Build  Channel
minimap2                  2.18                 h5bf99c6_0    https://anaconda.org/bioconda/minimap2/2.18/download
nanopsipy                 1.0                      pypi_0    pypi
numpy                     1.24.0           py39h223a676_0    conda-forge
pandas                    2.1.0            py39hddac248_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
python                    3.9.7           h49503c6_0_cpython    conda-forge
samtools                  1.12                 h9aed4be_1    bioconda
scipy                     1.13.1           py39haf93ffa_0    conda-forge
vetmohit89 commented 3 months ago

Though it seems you have used same reference file for both the control and treatment. However to troublshoot it, would you mind if you please share small dataset from each control and treatment files. I will try to run it at my end.

assetdaniyarov commented 3 months ago

github.zip

vetmohit89 commented 3 months ago

Please test this pipline with TEST dataset. Please see your fastq files: read length is too short. I suggest you to run the pipeline with test dataset, if it works fine, then there is some issue with your DRS data.

assetdaniyarov commented 2 months ago

Tried to run the program with test data. Unfortunately, the same error occurred. Is there any solution?

(nano) prom@PC48A067:/data/pipeline/NanoPsiPy/example/test$ /data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison \
> -c /data/pipeline/NanoPsiPy/example/test/MALAT1_Wildtype_PUS7.csv \
> -t /data/pipeline/NanoPsiPy/example/test/MALAT1_Mutant_PUS7.csv \
> -o /data/pipeline/NanoPsiPy/example/test/NanoPsiPy_comparison_Mutant_vs_WildType \
> -d genome

Traceback (most recent call last):
  File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 31, in <module>
    merge_and_analyze(args.control_file, args.treatment_file, args.output_folder, args.data_type)
  File "/data/pipeline/NanoPsiPy/bin/NanoPsiPy_comparison", line 14, in merge_and_analyze
    analyze(merge_new)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/NanoPsiPy/chi_square.py", line 6, in analyze
    data = pd.read_csv('merged.csv')
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
    self.handles = get_handle(
  File "/data/miniconda_2024/envs/nano/lib/python3.9/site-packages/pandas/io/common.py", line 863, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'merged.csv'
vetmohit89 commented 2 months ago

Dear,

Thank you for testing this tool. I've tested it and found that everything is functioning correctly. It appears that updating your dependencies according to the specified requirements may resolve the issue. Please ensure your Python version is updated to 3.11.0. It seems there's an issue generating a merged file from control.csv and treatment.csv, which is likely to be resolved after updating the necessary dependencies.

If possible, could you please share a preview of the control.csv and treatment.csv files?

Thank you

vetmohit89 commented 2 months ago

They are different. Have you updated all the dependencies!