sandberg-lab / Smart-seq3

Code and analysis pipeline for Smart-seq3 (Hagemann-Jensen et al. 2020).
GNU General Public License v3.0
50 stars 12 forks source link

a problem about follow your pipeline s3_isoform.py #4

Open loverlyday opened 3 years ago

loverlyday commented 3 years ago

i have fix the errors i asked 16 days ago ,but I get a new errors when do isoform reconstruction:

""" Traceback (most recent call last): File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/home/zhouw/xxxx/envs/trim/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/xxxx/temp/ERR3835349/pyModule/isoform_reconstruct.py", line 226, in _run_isoform results = aligned.groupby(by='BC_UB').apply(_isoform_inference_of_single_molec, ref_iso_dict[gene]) File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 859, in apply result = self._python_apply_general(f, self._selected_obj) File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 892, in _python_apply_general keys, values, mutated = self.grouper.apply(f, data, self.axis) File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 220, in apply res = f(group) File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 843, in f return func(g, args, kwargs) File "/home/xxxx/temp/ERR3835349/pyModule/isoform_reconstruct.py", line 196, in _isoform_inference_of_single_molec out = [aligned_reads_df[16].iloc[0], aligned_reads_df[17].iloc[0], File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/indexing.py", line 879, in getitem return self._getitem_axis(maybe_callable, axis=axis) File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/indexing.py", line 1496, in _getitem_axis self._validate_integer(key, axis) File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/site-packages/pandas/core/indexing.py", line 1437, in _validate_integer raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/xxxx/temp/ERR3835349/ss3_isofrom.py", line 109, in main() File "/home/xxxx/temp/ERR3835349/ss3_isofrom.py", line 105, in main get_isoforms(conf_data, out_path, ref) File "/home/xxxx/temp/ERR3835349/pyModule/isoform_reconstruct.py", line 469, in get_isoforms pool.map(func, remain_genes, chunksize=1) File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/xxxx/anaconda3/envs/trim/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value IndexError: single positional indexer is out-of-bounds

this errors the last remaining gene files but when i just run the gene file with "results = aligned.groupby(by='BC_UB').apply(_isoform_inference_of_single_molec, ref_iso_dict[gene])",

i find that the errors occur when you call _run_isoform.what's the different between _run_isoform and isoform_inference_correction_by_ass_v2? if the _run_isoform have no meaning.i will ignore it. looking for you early replay.

PingChen-Angela commented 3 years ago

The error looks like there might be no content in your input "aligned_reads_df". Is it empty?

loverlyday commented 3 years ago

The error looks like there might be no content in your input "aligned_reads_df". Is it empty?

no the file is not empty.the errors occur the last gene in the set, it's random.may be you can try your data when the first " remaining files" not equal to zero, then may you can see the error。

PingChen-Angela commented 3 years ago

Hi @loverlyday, I cannot see the error from my side. Can you send me the gene file where the error occurred? It is under the keptReads/chr folder with file name "[gene name]_aligned_reads.csv". That will help me find out the reason.

yiyelinfeng commented 3 years ago

Hi@loverlyday, I meet the same error, have you solved it? Looking forward to your reply. thanks!

PingChen-Angela commented 3 years ago

Hi@loverlyday, I meet the same error, have you solved it? Looking forward to your reply. thanks!

Hi, @yiyelinfeng! Can you send me one of the gene files under the keptReads/chr folder with file name "[gene name]_aligned_reads.csv". That will help me fix the issue. Thanks!

yiyelinfeng commented 3 years ago

Hi, PingChen-Angela!, the attaches are the files your need, tkank you very much!

At 2021-08-06 15:41:08, "PingChen-Angela" @.***> wrote:

@.***, I meet the same error, have you solved it? Looking forward to your reply. thanks!

Hi, @yiyelinfeng! Can you send me one of the gene files under the keptReads/chr folder with file name "[gene name]_aligned_reads.csv". That will help me fix the issue. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

kwglam commented 3 years ago

Hi @yiyelinfeng and PingChen-Angela,

After running the ss3_isoform.py program for almost two weeks, I have now got exactly the same error as what you reported (Please see the error message below). Have you figured out what the problem is? Would you please kindly share how you fix this bug? Thanks you very much in advance in replying my message.

Error message:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/gpfs/gsfs10/users/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 226, in _run_isoform results = aligned.groupby(by='BC_UB').apply(_isoform_inference_of_single_molec, ref_iso_dict[gene]) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1253, in apply result = self._python_apply_general(f, self._selected_obj) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1287, in _python_apply_general keys, values, mutated = self.grouper.apply(f, data, self.axis) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 820, in apply res = f(group) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 1237, in f return func(g, args, kwargs) File "/gpfs/gsfs10/users/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 196, in _isoform_inference_of_single_molec out = [aligned_reads_df[16].iloc[0], aligned_reads_df[17].iloc[0], File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/indexing.py", line 931, in getitem return self._getitem_axis(maybe_callable, axis=axis) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/indexing.py", line 1566, in _getitem_axis self._validate_integer(key, axis) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/data/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in main() File "/data/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main get_isoforms(conf_data, out_path, ref) File "/gpfs/gsfs10/users/xxx/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 469, in get_isoforms pool.map(func, remain_genes, chunksize=1) File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/data/xxx/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value IndexError: single positional indexer is out-of-bounds

PingChen-Angela commented 3 years ago

@yiyelinfeng and @kwglam, I have updated the code and please try again.

kwglam commented 3 years ago

Hi Angela,

Thanks for updating the script. I will try to run it again.

BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

PingChen-Angela commented 3 years ago

Hi Angela,

Thanks for updating the script. I will try to run it again.

BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

kwglam commented 3 years ago

Hi Angela, Thanks for updating the script. I will try to run it again. BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

PingChen-Angela commented 3 years ago

Hi Angela, Thanks for updating the script. I will try to run it again. BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

kwglam commented 3 years ago

Hi Angela, Thanks for updating the script. I will try to run it again. BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

Hi Angela,

I have run your script with the updated code and it finally generated the 'assigned_isoforms' folder. However, it hit with another error:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 336, in isoform_inference_correction_by_ass_v2 ass_junc = get_junction(ass, trans_df) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 260, in get_junction ass_start_junc = tmp.apply(_get_junc_start, axis=1, trans_df=trans_df) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply return op.apply() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply return self.apply_standard() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard results, res_index = self.apply_series_generator() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator results[i] = self.f(v) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 131, in f return func(x, args, kwargs) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 272, in _get_junc_start row_idx = list(trans_df.query('Exon_Idx=="%s" and Transcripts=="%s"' %(x[3], x[2])).index)[0] IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in main() File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main get_isoforms(conf_data, out_path, ref) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms pool.map(func, infered_gene_paths, chunksize=1) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value IndexError: list index out of range

Any insights on this? Thanks a lot!!

Gabriel

PingChen-Angela commented 3 years ago

Hi Angela, Thanks for updating the script. I will try to run it again. BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

Hi Angela,

I have run your script with the updated code and it finally generated the 'assigned_isoforms' folder. However, it hit with another error:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 336, in isoform_inference_correction_by_ass_v2 ass_junc = get_junction(ass, trans_df) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 260, in get_junction ass_start_junc = tmp.apply(_get_junc_start, axis=1, trans_df=trans_df) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply return op.apply() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply return self.apply_standard() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard results, res_index = self.apply_series_generator() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator results[i] = self.f(v) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 131, in f return func(x, args, kwargs) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 272, in _get_junc_start row_idx = list(trans_df.query('Exon_Idx=="%s" and Transcripts=="%s"' %(x[3], x[2])).index)[0] IndexError: list index out of range """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in main() File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main get_isoforms(conf_data, out_path, ref) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms pool.map(func, infered_gene_paths, chunksize=1) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value IndexError: list index out of range

Any insights on this? Thanks a lot!!

Gabriel

Hi Gabriel, Thanks for reporting this error. I will look into it soon. In the meanwhile, if you change your number of processes, do you still see the error?

kwglam commented 3 years ago

Hi Angela, Thanks for updating the script. I will try to run it again. BTW, I am wondering how much walltime you usually need to run the ss3_isoform.py script with 50 processors. I previously used 10 processors to run the script and it took more than 12 days to hit the issue. Thanks!

It might be a bit slow for big dataset, but you don't need to rerun everything. You can just update the code there and use the same output folder.

But I guess I still have to run the entire quantification part (-Q), right? I reckoned that if using the same output folder, it will skip genes that already existed. Is that what you mean?

Yes.

Hi Angela, I have run your script with the updated code and it finally generated the 'assigned_isoforms' folder. However, it hit with another error: multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 336, in isoform_inference_correction_by_ass_v2 ass_junc = get_junction(ass, trans_df) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 260, in get_junction ass_start_junc = tmp.apply(_get_junc_start, axis=1, trans_df=trans_df) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/frame.py", line 8736, in apply return op.apply() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 688, in apply return self.apply_standard() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 812, in apply_standard results, res_index = self.apply_series_generator() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 828, in apply_series_generator results[i] = self.f(v) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/core/apply.py", line 131, in f return func(x, args, kwargs) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 272, in _get_junc_start row_idx = list(trans_df.query('Exon_Idx=="%s" and Transcripts=="%s"' %(x[3], x[2])).index)[0] IndexError: list index out of range """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in main() File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main get_isoforms(conf_data, out_path, ref) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms pool.map(func, infered_gene_paths, chunksize=1) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value IndexError: list index out of range Any insights on this? Thanks a lot!! Gabriel

Hi Gabriel, Thanks for reporting this error. I will look into it soon. In the meanwhile, if you change your number of processes, do you still see the error?

Hi Angela, Very much appreciated your efforts and time. I used the default number of processes (8) in the command line. Do I have to specify the same number in the config file (nproc=8)? What is the appropriate number to use? Thanks!

kwglam commented 3 years ago

Hi Angela, I tried with nproc=8 in the config. The program terminated again with the same error but at different gene. Meanwhile, I was running the program with another bam file. However, it encountered another problem that did not appear before.

ENSG00000141384 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, kwds)) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 300, in isoform_inference_correction_by_ass_v2 initial_infered = pd.read_table(gene_file, header=None, index_col=None, sep="\t") File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(args, kwargs) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 683, in read_table return _read(filepath_or_buffer, kwds) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read parser = TextFileReader(filepath_or_buffer, kwds) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in init self._engine = self._make_engine(self.engine) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine return mapping[engine](self.f, self.options) # type: ignore[call-arg] File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in init self._reader = parsers.TextReader(self.handles.handle, **kwds) File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.cinit pandas.errors.EmptyDataError: No columns to parse from file """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 109, in main() File "/data/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/ss3_isoform.py", line 105, in main get_isoforms(conf_data, out_path, ref) File "/gpfs/gsfs10/users/lamkg/ss3iso/ss3iso_downloads/Smart-seq3/ss3iso/pyModule/isoform_reconstruct.py", line 484, in get_isoforms pool.map(func, infered_gene_paths, chunksize=1) File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/data/lamkg/conda/envs/ss3iso_trial/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value pandas.errors.EmptyDataError: No columns to parse from file

Do you think these problems were caused by the number of processes used or by specific genes? Thanks!

PingChen-Angela commented 3 years ago

Hi @kwglam, the issue came from parallelisation. Can you send me the gene file with name "ENSG00000141384" under .R1 in your output folder? Please send to my email address angela.pingchen@gmail.com.

xucaoling commented 2 years ago

Hi Angela, When i use ss3_isoform.py, I got an error:

and error message: Preprocessing on input BAM ... [bam_sort_core] merging from 88 files and 8 in-memory blocks... Collect informative reads per gene... ...for genes on chr1 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 84, in get_aligned_reads samfile = pysam.AlignmentFile(self.in_bam_uniq, "rc") File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 109, in main() File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 99, in main fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads report_genes = pool.map(func, genes, chunksize=1) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False

and my code is: $python /home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py -i smartseq3_mouse_fibroblast.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam -e smartseq3_mouse_fibroblast -o ss3 -p 8 -s mm10 -P -R -c ss3_isoform.conf

I don't know how to solve the problem. Will you help me out?

jiangfuqing commented 1 year ago

Hi Angela, When i use ss3_isoform.py, I got an error:

and error message: Preprocessing on input BAM ... [bam_sort_core] merging from 88 files and 8 in-memory blocks... Collect informative reads per gene... ...for genes on chr1 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 479, in _get_reads report_gene = gobj.get_aligned_reads(n_read_limit, passed_cells) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 84, in get_aligned_reads samfile = pysam.AlignmentFile(self.in_bam_uniq, "rc") File "pysam/libcalignmentfile.pyx", line 741, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 990, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 109, in main() File "/home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py", line 99, in main fetch_gene_reads(in_bam_uniq, in_bam_multi, conf_data, op.species, out_path) File "/home/data/vip55/software/Smart-seq3-master/ss3iso/pyModule/informative_reads.py", line 550, in fetch_gene_reads report_genes = pool.map(func, genes, chunksize=1) File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/data/vip55/miniconda3/envs/zUMIs-env/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: file has no sequences defined (mode='rc') - is it SAM/BAM format? Consider opening with check_sq=False

and my code is: $python /home/data/vip55/software/Smart-seq3-master/ss3iso/ss3_isoform.py -i smartseq3_mouse_fibroblast.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam -e smartseq3_mouse_fibroblast -o ss3 -p 8 -s mm10 -P -R -c ss3_isoform.conf

I don't know how to solve the problem. Will you help me out?

Hi, have you fixed this error? thanks