mcgilldinglab / MATES

A Deep Learning-Based Model for Quantifying Transposable Elements in Single-Cell Sequencing Data
MIT License
10 stars 0 forks source link

Failed to open file "./unique_read/sample/by_barcode/*.bam" #7

Open Yuanqingq opened 2 weeks ago

Yuanqingq commented 2 weeks ago

Hi, Thank you for developing this software. I would like to try it, but I encountered some problems. I ran MATES like this:

import MATES
from MATES import bam_processor
from MATES import data_processor
from MATES import MATES_model
from MATES import TE_quantifier
from MATES import TE_quantifier_LongRead
from MATES import TE_quantifier_Intronic
bam_processor.split_bam_files("10X",20,"samplelist.txt","bam_path_file.txt",bc_ind="CR",bc_path_file="barcode.txt")

but I got this error: it failed to open file "./unique_read/sample/by_barcode/*.bam" and "./multi_read/sample/by_barcode/*.bam". I wonder if I made anything wrong and how I can solve it.

Directory ./file_tmp created.
Directory ./bam_tmp created.
Directory ./bc_tmp created.
Start splitting bam files into unique/multi reads sub-bam files ...
Directory ./unique_read created.
Directory ./multi_read created.
Finish splitting bam files into unique reads and multi reads sub-bam files.
Start splitting multi sub-bam based on cell barcodes...
[E::hts_open_format] Failed to open file "./unique_read/sample/by_barcode/*.bam" : No such file or directory
samtools sort: can't open "./unique_read/sample/by_barcode/*.bam": No such file or directory
[E::hts_open_format] Failed to open file "./unique_read/sample/by_barcode/*.bam" : No such file or directory
samtools index: failed to open "./unique_read/sample/by_barcode/*.bam": No such file or directory
Finish splitting unique sub-bam.
[E::hts_open_format] Failed to open file "./multi_read/sample/by_barcode/*.bam" : No such file or directory
samtools sort: can't open "./multi_read/sample/by_barcode/*.bam": No such file or directory
[E::hts_open_format] Failed to open file "./multi_read/sample/by_barcode/*.bam" : No such file or directory
samtools index: failed to open "./multi_read/sample/by_barcode/*.bam": No such file or directory
Finish splitting multi sub-bam.
Directory ./file_tmp removed.
Directory ./bam_tmp removed.
Directory ./bc_tmp removed.

And I have another question. I found that in your sample_pipeline.ipynb, before bam_processor, you did python ../build_reference.py --species Mouse. After I noticed that, I did python ../build_reference.py --species Mouse as well, but I got error:

mm10.fa.out.gz already exists, skipping download and unzip.
gencode.vM10.annotation.gtf.gz already exists, skipping download and unzip.
Traceback (most recent call last):
  File "/home/Software/MATES/MATES/../build_reference.py", line 153, in <module>
    main()
  File "/home/Software/MATES/MATES/../build_reference.py", line 41, in main
    genes = pd.read_csv(f"{species.lower()}_Genes.csv")
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 577, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
    self.handles = get_handle(
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/io/common.py", line 859, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'mouse_Genes.csv'

And what I truly want is the Drosophila melanogaster (fruit fly). I did python ../build_reference.py --species Other --other_species_TE dm6.fa.out --other_species_GTF dm6_changeChr_forMATES.gtf and I got a different error:

Traceback (most recent call last):
  File "/home/Software/MATES/MATES/../build_reference.py", line 153, in <module>
    main()
  File "/home/Software/MATES/MATES/../build_reference.py", line 46, in main
    TEs = TEs[["genoName","genoStart","genoEnd", "strand","index", "repName","repClass"]]
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/core/frame.py", line 3766, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5876, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5938, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['genoName', 'genoStart', 'genoEnd', 'strand', 'repName', 'repClass'] not in index"

So I edited the dm6.fa.out, it looks like this:

head dm6.fa.out
SW_score,perc_div.,perc_del.,perc_ins.,genoName,genoStart,genoEnd,(left),strand,repName,repClass,in_end,repeat_(left),ID,other
166,7.9,0.6,2.4,chr2L,2,170,(23513542),+,HETRP_DM,Satellite,1507,1672,(175),1
27,15.7,0.0,2.0,chr2L,215,266,(23513446),+,HETRP_DM,Satellite,1571,1621,(226),1
65,11.5,24.2,3.4,chr2L,310,408,(23513304),+,HETRP_DM,Satellite,1504,1622,(225),1
170,11.0,0.6,2.3,chr2L,452,628,(23513084),+,HETRP_DM,Satellite,1499,1672,(175),1
27,15.7,0.0,2.0,chr2L,673,724,(23512988),+,HETRP_DM,Satellite,1571,1621,(226),1
65,11.5,24.2,3.4,chr2L,768,866,(23512846),+,HETRP_DM,Satellite,1504,1622,(225),1
170,11.0,0.6,2.3,chr2L,910,1086,(23512626),+,HETRP_DM,Satellite,1499,1672,(175),1
27,15.7,0.0,2.0,chr2L,1131,1182,(23512530),+,HETRP_DM,Satellite,1571,1621,(226),1
65,11.5,24.2,3.4,chr2L,1226,1324,(23512388),+,HETRP_DM,Satellite,1504,1622,(225),1

Also I changed TEs = TEs[["genoName","genoStart","genoEnd", "strand","index", "repName","repClass"]] in the build_reference.py because no TEs used, only TE.

TE = TEs[["genoName","genoStart","genoEnd", "strand","index", "repName","repClass"]]
TE.columns = ['TE_chrom','start','end','index','strand','TE_Name','TE_Fam']

but it still goes wrong:

Traceback (most recent call last):
  File "/home/Software/MATES/MATES/../build_reference.py", line 153, in <module>
    main()
  File "/home/Software/MATES/MATES/../build_reference.py", line 57, in main
    genes = genes[genes['Feature'] == 'gene']
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pyranges/pyranges_main.py", line 437, in __getitem__
    return _getitem(self, val)
  File "/home/miniconda3/envs/MATES_gitclone/lib/python3.9/site-packages/pyranges/methods/getitem.py", line 33, in _getitem
    raise Exception("Not a valid subsetter: {}".format(str(val)))
Exception: Not a valid subsetter: False

So I wonder whether I should build TE reference first and then do bam_processor? And about the bam_processor, whether I made anything wrong.

Many thanks! Yuanqingq

Szym29 commented 2 weeks ago

Hi,

Thank you for your interest in using our tool. For the first question, it seems to be a file path issue. An easy solution you can try is to edit the 'bam_path_file.txt' file, and use the complete path to your bam files.

For the second question, we will upload the csv files very soon. They were in the repo, but seemed to be removed by mistakes.

For the third question, we also ran Drosophila. @RoKsaNne Can we also include Drosophila files like human and mouse to MATES? Can you also look at the customized 'dm6.fa.out' @Yuanqingq shared and check if there is anything wrong with the file format?

Thanks, Yumin

Yuanqingq commented 2 weeks ago

Thank you very much for your reply! I tried to use the complete path of my bam file, but I still got the error. I checked ./unique_read/sample/sample_unique_splitting.log and I found there are many errors in the file.

CR ./unique_read/sample_uniqueread.bam /home/dm6/MATES_test4/STARsolo/sampleSolo.out/Gene/filtered/barcodes.tsv ./unique_read/sample/by_barcode/
error
error
error
error
error
......
error
error
error
error
Finish Batch: 1
error
error
error
error
error
......
error
error
error
error
Finish Batch: 2
error
error
error
error
error
......
error
error
Finish Batch: 3
Successfully End on Splitting: ./unique_read/sample_uniqueread.bam

Like this. And there are no files in the folder ./unique_read/sample/by_barcode. It seems I can't split bam files well. I wonder what's wrong with it. Thanks you very much!