yuntianf / LongcellPre

A pipeline for Nanopore single cell isoform quantification in R
MIT License
4 stars 1 forks source link

Something wrong when downloading the longcellpre #1

Closed yuanwsy closed 3 months ago

yuanwsy commented 8 months ago

Hi,

Thanks for developing this very useful and much needed tool! Is this version of longcellpre ok to use yet? when I download the longcellpre,I can not download successful the error looks like this: Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called ‘Longcellsrc’ Calls: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart Execution halted ERROR: lazy loading failed for package ‘LongcellPre’

thanks for your help

yuanwsy commented 8 months ago

I use devtools::install_github("yuntianf/Longcellsrc") to download the Longcellsrc first ,and successfully downloaded Longcellpre. Another question : have the Longcellpre instruction be uploaded yet?

yuntianf commented 8 months ago

Hi yuan, I'm writing the document for this package, but I could first update a simplified version so you can use it on your data.One thing to notice is that I'm still tesing this pipeline to make sure its feasiblility, so it may run into some bugs for some kind of data. I will complete the test as soon as possible in the recent two days, if you want to use before that and meet any problems, please don't hesitate to contact me, I will help debug.

Thanks

yuanwsy commented 7 months ago

Thank you! and I do have some problems when using this tool: I provide gtf_path = "~/gencode.vM34.primary_assembly.annotation.gtf" genome_path = "~/GRCm39.genome.fa.gz" minimap_bed_path = "~/gencode.vM34.primary_assembly.annotation.bed" genome_name = "m39" toolkit = "3" and my data 、barcodes....... and run command like: RunLongcellPre(fastq = fastq,barcode_path = barcodes,toolkit = toolkit, genome_path = genome_path,genome_name = genome_name, gtf_path = gtf_path,minimap_bed_path = minimap_bed_path,work_dir = work_dir, samtools = samtools, minimap2 = minimap2,bedtools = bedtools)

then the situation is : [1] "Start to build exon annotation based on the gtf file." Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Error in mutate(): ℹ In argument: id = exon_id(unique(strand), n()). ℹ In group 1: gene = "ENSMUSG00000000". Caused by error in if (strand == "+") ...: ! the condition has length > 1 Run rlang::last_trace() to see where the error occurred.

and run the rlang::last_trace():

rlang::last_trace(drop = FALSE) <error/dplyr:::mutate_error> Error in mutate(): ℹ In argument: id = exon_id(unique(strand), n()). ℹ In group 1: gene = "ENSMUSG00000000". Caused by error in if (strand == "+") ...: ! the condition has length > 1

Backtrace: ▆

  1. ├─LongcellPre::RunLongcellPre(...)
  2. │ ├─base::do.call(annotation, Param)
  3. │ └─LongcellPre (local) <fn>(...)
  4. │ └─LongcellPre:::createAnnotation(...)
  5. │ └─LongcellPre::gtf2bed(...)
  6. │ └─temp %>% group_by(gene) %>% ...
  7. ├─dplyr::mutate(., id = exon_id(unique(strand), n()))
  8. ├─dplyr:::mutate.data.frame(., id = exon_id(unique(strand), n()))
  9. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  10. │ ├─base::withCallingHandlers(...)
  11. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
  12. │ └─mask$eval_all_mutate(quo)
  13. │ └─dplyr (local) eval()
  14. ├─LongcellPre:::exon_id(unique(strand), n())
  15. └─base::.handleSimpleError(...)
  16. └─dplyr (local) h(simpleError(msg, call))
  17. └─rlang::abort(message, class = error_class, parent = parent, call = error_call)

Do you know where the error comes form?the gtf file I provide?

yuntianf commented 7 months ago

Hi, Thank you for pointing this out. I haven't benchmarked this pipeline with mouse dataset, so there may be some unanticipated bugs. I have fixed the bug for the annotation function for mouse gtf, it should work this time. Please let me know if you meet any other problems.

Thanks

yuanwsy commented 7 months ago

It seems that I have to download some packages manually when I update the Longcellpre Because when I update the packages,It always reminds me just like this:

Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) : there is no package called ‘peakRAM’ Calls: ... loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart Execution halted ERROR: lazy loading failed for package ‘LongcellPre’

and I have also got some warning because of the package function have some confliction: Warning: replacing previous import ‘dplyr::union’ by ‘igraph::union’ when loading ‘LongcellPre’ Warning: replacing previous import ‘dplyr::as_data_frame’ by ‘igraph::as_data_frame’ when loading ‘LongcellPre’ Warning: replacing previous import ‘dplyr::groups’ by ‘igraph::groups’ when loading ‘LongcellPre’

yuanwsy commented 7 months ago

and the confliction seems to make the library error:

library(LongcellPre) Error in completeSubclasses(classDef2, class1, obj, where) : trying to get slot "subclasses" from an object of a basic class ("NULL") with no slots In addition: Warning messages: 1: replacing previous import ‘dplyr::union’ by ‘igraph::union’ when loading ‘LongcellPre’ 2: replacing previous import ‘dplyr::as_data_frame’ by ‘igraph::as_data_frame’ when loading ‘LongcellPre’ 3: replacing previous import ‘dplyr::groups’ by ‘igraph::groups’ when loading ‘LongcellPre’ Error: package or namespace load failed for ‘LongcellPre’: .onLoad failed in loadNamespace() for 'dbplyr', details: call: setClass(cl, contains = c(prevClass, "VIRTUAL"), where = where) error: error in contained classes ("character") for class “ident”; class definition removed from ‘dbplyr’

yuntianf commented 7 months ago

Hi, You can automatically install the dependency while you update the package by setting install_github("yuntianf/LongcellPre",dependency=TRUE). The confliction should be due to I import the whole dplyr and igraph in the NAMESPACE, I will update it later but it shouldn't influence the installation and usage. For the error of dbplyr, I didn't meet that. LongcellPre doesn't import that package. I think I need to do some search to figure out this problem.

yuntianf commented 7 months ago

I have updated the namespace and the warnings disappeared, not sure if it still collides with dbplyr, you could have a try.

yuanwsy commented 7 months ago

still the same problem, the whole process like this:

library(future) library(Longcellsrc) library(LongcellPre) fastq = "~/fastq.gz" barcodes = "~/barcodes.tsv.gz" gtf_path = "~/gencode.vM34.primary_assembly.annotation.gtf" genome_path = "~/GRCm39.genome.fa.gz" minimap_bed_path = "~/gencode.vM34.primary_assembly.annotation.bed" genome_name = "mm39" toolkit = "3" work_dir = "~/OE/" RunLongcellPre(fastq = fastq,barcode_path = barcodes,toolkit = toolkit, genome_path = genome_path,genome_name = genome_name, gtf_path = gtf_path,minimap_bed_path = minimap_bed_path,work_dir = work_dir, samtools = samtools, minimap2 = minimap2,bedtools = bedtools) [1] "Start to build exon annotation based on the gtf file." Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Error in mutate(): ℹ In argument: id = exon_id(unique(strand), n()). ℹ In group 1: gene = "ENSMUSG00000000001". Caused by error in n(): ! could not find function "n" Run rlang::last_trace() to see where the error occurred. Warning message: In .get_cds_IDX(mcols0$type, mcols0$phase) : The "phase" metadata column contains non-NA values for features of type stop_codon. This information was ignored.

It seems to stiil inadequacy for mouse genome

yuntianf commented 7 months ago

Hi, I didn't meet this problem in my testing, but I update the namespace to specifcally import n() from dplyr, please tell me if it still doesn't work.

yuanwsy commented 7 months ago

Thanks! It seems to work now ,

RunLongcellPre(fastq = fastq,barcode_path = barcodes,toolkit = toolkit, genome_path = genome_path,genome_name = genome_name, gtf_path = gtf_path,minimap_bed_path = minimap_bed_path,work_dir = work_dir, samtools = samtools, minimap2 = minimap2,bedtools = bedtools) [1] "Start to build exon annotation based on the gtf file." Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK LongcellPre would be applied with 1 threads in sequential mode. Start to do barcode match:In total 67419192 reads out of 96853581 reads are identified with a valid adapter. Within them, 67418347 reads are identified with a valid tag region. start to build index index finished! The size of index is 3892

and it stucks here for about 12h, is this situation right?

yuntianf commented 7 months ago

Hi, Glad to see it worked on your side. But you data seems quite large, I would suggest to first subsample a smaller data, like 100k reads to test if LongcellPre could work on your data. Then you could run your whole data with parallization by setting the cores and strategy parameter in RunLongcellPre. You could refer those two parameters in future package.

rustino commented 7 months ago

For me, worked with mouse genome (at leas in a small downsampled fastq as @yuntianf recommended, haven't tried full) using https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/latest/mm10.fa.gz and https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.ensGene.gtf.gz. With other versions of the genome, I had the same issue.

yuntianf commented 7 months ago

Hi @rustino, Sorry could you please specify what issues you met? Then I could figure out what happened behind.

rustino commented 7 months ago

Sure @yuntianf, and sorry for the confusion.

I meant this problem:

fastq = "fastq.fastq.gz"
barcodes = "/white_list/737K-august-2016.txt"
gtf_path = "mm10.knownGene.gtf"
genome_path = "mm10.fa"
genome_name = "mm10"
toolkit = 3
work_dir = "/outdir/"

# specify the path for those tools
samtools = "/opt/conda/bin/samtools"
minimap2 = "/opt/conda/bin/minimap2"
bedtools = "/opt/conda/bin/bedtools"
> RunLongcellPre(fastq = fastq, barcode_path = barcodes,toolkit = toolkit,
               genome_path = genome_path,genome_name = genome_name,
               gtf_path = gtf_path, ,work_dir = work_dir,
               samtools = samtools, minimap2 = minimap2, bedtools = bedtools)

[1] "Start to build exon annotation based on the gtf file."
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Error in `mutate()`:
ℹ In argument: `id = exon_id(unique(strand), n())`.
ℹ In group 323: `gene = "A0A087WNV3"`.
Caused by error in `if (strand == "+") ...`:
! the condition has length > 1
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.

It was resolved when I switch from mm10.knownGene.gtf to mm10.ensGene.gtf (I edited my previous reply, I'm going back and forth with this program but seems to be working now).

yuntianf commented 7 months ago

Hi @rustino, Thanks for pointing this out! I test this program all based on ensembl gene id so I didn't meet this bug. I will test annotation with other kinds of gene names to generalize it.

yuanwsy commented 7 months ago

Will there be some notes note me that the RunLongcellPre gets to an end? or after all this command it comes to an end: There are 1012578 sequences identified with a barcode There are 1010005 sequences identified with a barcode There are 1012324 sequences identified with a barcode There are 1009746 sequences identified with a barcode

yuntianf commented 7 months ago

Hi @yuanwsy , When LongcellPre finishes, it will cast out "LongcellPre pipeline finished!". The notes you showed indicate the barcode match step has been finished.

yuanwsy commented 7 months ago

I got an error that haven't figure out how to mix it now,do you know why?

[bam_sort_core] merging from 0 files and 64 in-memory blocks... ***** WARNING: File /data/R02/yuanwsy/longcellpre/OEsubsample3//annotation/cover.bed has inconsistent naming convention for record: GL456210.1 157534 157862

***** WARNING: File /data/R02/yuanwsy/longcellpre/OEsubsample3//annotation/cover.bed has inconsistent naming convention for record: GL456210.1 157534 157862

***** WARNING: File /data/R02/yuanwsy/longcellpre/OEsubsample3//annotation/temp.bed has inconsistent naming convention for record: GL456354.1 25923 27230 - ENSMUSG00000074720

***** WARNING: File /data/R02/yuanwsy/longcellpre/OEsubsample3//annotation/temp.bed has inconsistent naming convention for record: GL456354.1 25923 27230 - ENSMUSG00000074720

Start to extract isoforms:Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i], start[i], : sequence JH584304.1 not found

yuntianf commented 7 months ago

Hi @yuanwsy, I think this is an error raised by getSeq from the package BSgenome. I guess the JH584304.1 is passed to the names parameter which should represent the chromosome. Could you please check if JH584304.1 is in the chr column in you bed file? And please keep the genome_name in RunLongcellPre the same version as the genome you mapped to.

yuanwsy commented 7 months ago

Yeah,I found it in the genecover.bed: JH584304.1 52190 59690 - ENSMUSG00000095041 the JH584304.1 is in the chr column

and I have kept the genome_name in RunLongcellPre the same version as the genome I mapped to(GRCm39-mm39) so I have to delete that line?or do something else?

yuntianf commented 7 months ago

Could you check you bam file header to see if it also contains JH584304.1? I guess this JH584304.1 doesn't exist in the BSgenome, so if you are not specifically interested this gene, you could delete it in the gene bed.

yuanwsy commented 7 months ago

I try another annotation.gtf and it works but I meet another problem:

Start to extract isoforms:Isoform extraction took 3.30 mins

Elapsed_Time_sec Total_RAM_Used_MiB Peak_RAM_Used_MiB 1 188.583 6.1 740.9 Error in $<-: ! Assigned data rep(unit, (nrow(data)%/%length(unit) + 1))[1:nrow(data)] must be compatible with existing data. ✖ Existing data has 7210 rows. ✖ Assigned data has 36464 rows. ℹ Only vectors of size 1 are recycled. Caused by error in vectbl_recycle_rhs_rows(): ! Can't recycle input of size 36464 to size 7210. Run rlang::last_trace() to see where the error occurred.

rlang::last_trace() <error/tibble_error_assign_incompatible_size> Error in $<-: ! Assigned data rep(unit, (nrow(data)%/%length(unit) + 1))[1:nrow(data)] must be compatible with existing data. ✖ Existing data has 7210 rows. ✖ Assigned data has 36464 rows. ℹ Only vectors of size 1 are recycled. Caused by error in vectbl_recycle_rhs_rows(): ! Can't recycle input of size 36464 to size 7210.

Backtrace: ▆

  1. └─LongcellPre::RunLongcellPre(...)
  2. ├─base::do.call(umi_count_corres, Param)
  3. └─LongcellPre (local) <fn>(...)
  4. └─LongcellPre:::genes_distribute(data, cores, gene)
  5. ├─base::$<-(*tmp*, "group", value = <int>)
  6. └─tibble:::$<-.tbl_df(*tmp*, "group", value = <int>)
  7. └─tibble:::tbl_subassign(...)
  8. └─tibble:::vectbl_recycle_rhs_rows(value, fast_nrow(xo), i_arg = NULL, value_arg, call)
rustino commented 7 months ago

I got exactly the same error as @yuanwsy. (I copy here the first part of it).

Error in$<-: ! Assigned datarep(unit, (nrow(data)%/%length(unit) + 1))[1:nrow(data)]must be compatible with existing data. ✖ Existing data has 13774 rows. ✖ Assigned data has 2032053 rows. ℹ Only vectors of size 1 are recycled. Caused by error invectbl_recycle_rhs_rows(): ! Can't recycle input of size 2032053 to size 13774.

yuntianf commented 7 months ago

Hi @rustino @yuanwsy , Thanks for pointing this out. This is a bug in the parallization and I have fixed it. Now it should work.

rustino commented 7 months ago

Hi @yuntianf, just let you know that the problem with parallelization is solved. I'll let you know if I run into more issues. Thanks!

yuanwsy commented 7 months ago

It seems to take up too much memory to run the whole pipeline at the same time in my data,especially the step of barcodematch,do you have any recommendation for me to run?(just like if this step can be seperate into several steps to run

yuntianf commented 7 months ago

Hi @yuanwsy, This is expected as you have a really large dataset, and currently LongcellPre would store the barcode matching result as an intermediate variable, which leads to large memory usage. I could optimize it to read the fastq and store the result stage by stage, but it takes time. For your time efficiency, I would recommend to first split your fastq files into smaller sub-fastqs. Then you could run LongcellPre in seperate steps for each sub-fastq:

### In R
# replace your input path here
fastq = "sub_fastq.fq.gz"
barcodes = "path of your input cell barcode whitelist"
gtf_path = "path of your gtf annotation"
genome_path = "path of your genome annotation"
minimap_bed_path = "path of your bed annotation for minimap2, can be generated from gtf" //unnecessary
genome_name = "the genome name used for mapping, ex. hg38"
toolkit = your 10X sequencing toolkit
work_dir = "The output directory"

# create annotation
init(work_dir)
refer = annotation(gtf_path = gtf_path, work_dir = work_dir ,overwrite = FALSE)
gene_bed = refer[[1]]
gtf = refer[[2]]

# barcode match
plan(strategy = "multisession",workers = 4)
reads_bc = reads_extract_bc(fastq_path = fastq, barcode_path = barcodes , gene_bed = gene_bed,
                                              genome_path = genome_path, genome_name = genome_name, toolkit = toolkit,
                                              minimap_bed_path = minimap_bed_path, work_dir = work_dir, adapter = NULL)

Then in shell you can concat the output from the barcode match step and read it in R as the input for the UMI deduplication step.

### In R
reads_bc = read.table(file.path(work_dir,"BarcodeMatch/BarcodeMatch.txt"),header = TRUE,sep = "\t")
qual = read.table(file.path(work_dir,"BarcodeMatch/adapterNeedle.txt"), header = TRUE, sep = "\t")

plan(strategy = "multisession",workers = 4)
umi_count_corres(data = reads_bc,qual = qual,dir = file.path(work_dir,"/out/"),gene_bed = gene_bed,gtf = gtf)

Hope above steps could help with the memory problem.

Best

yuanwsy commented 7 months ago

Thanks for your help,but when I trying to seperate the step, I try to use the readRDS to read the annotation file for barcodematch step,but if I do so ,the barcode match step come to an error like this gene_bed <- readRDS("/gene_bed.rds") Start to do barcode match:Error: Not compatible with requested type: [type=character; target=integer]. Timing stopped at: 0.002 0 0.002

yuanwsy commented 7 months ago

and when I only do the annotation step ,It will also come to an error: refer = annotation(gtf_path = gtf_path, work_dir = work_dir ,overwrite = FALSE) [1] "Start to build exon annotation based on the gtf file." Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Error in file(file, ifelse(append, "a", "w")) : cannot open the connection

yuntianf commented 7 months ago

Thanks for your help,but when I trying to seperate the step, I try to use the readRDS to read the annotation file for barcodematch step,but if I do so ,the barcode match step come to an error like this gene_bed <- readRDS("/gene_bed.rds") Start to do barcode match:Error: Not compatible with requested type: [type=character; target=integer]. Timing stopped at: 0.002 0 0.002

For this one, could you please make sure your path for input gene bed is correct, as /gene_bed.rds seems to be in the root path. If it's correct, could you please print a head part of this file?

and when I only do the annotation step ,It will also come to an error: refer = annotation(gtf_path = gtf_path, work_dir = work_dir ,overwrite = FALSE) [1] "Start to build exon annotation based on the gtf file." Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Error in file(file, ifelse(append, "a", "w")) : cannot open the connection

And for this one, according to the output, it seems that it stops at the step to write the output file, could you please make sure you have the permission to write the output in your output folder?

yuanwsy commented 7 months ago

yes,the path is right, and the head part of the file looks like this : chr start end width strand gene id chr1 3143476 3144545 1070 + ENSMUSG00000102693 1 chr1 3172239 3172348 110 + ENSMUSG00000064842 1

and the second question,you mean I should use the init(work_dir) command to make sure I have the permission to write the output in your output folder?or in other way?

yuntianf commented 7 months ago

The init(work_dir) will create serveral subfolders to save the output, if you didn't run it, the output folder doesn't exist, thus the result file couldn't be saved.

yuanwsy commented 7 months ago

The init(work_dir) will create serveral subfolders to save the output, if you didn't run it, the output folder doesn't exist, thus the result file couldn't be saved.

Thank you ,I got it ,and could you tell me about the package that contains the init() function? I have not found the init() function

yuntianf commented 7 months ago

Now I have exported the init function into the NAMESPACE in LongcellPre, please install the new package and try it again.

yuanwsy commented 7 months ago

still,there is something wrong when seperate these steps, when I run the annotation generation step independently, the barcode match step will get some error: Start to do barcode match:Error: Not compatible with requested type: [type=character; target=integer]. Timing stopped at: 0.001 0.001 0.024 Execution halted

the same error as above,but if I use the runlongcellpre command , the error will not come

yuntianf commented 7 months ago

Hi @yuanwsy, This is expected as you have a really large dataset, and currently LongcellPre would store the barcode matching result as an intermediate variable, which leads to large memory usage. I could optimize it to read the fastq and store the result stage by stage, but it takes time. For your time efficiency, I would recommend to first split your fastq files into smaller sub-fastqs. Then you could run LongcellPre in seperate steps for each sub-fastq:

### In R
# replace your input path here
fastq = "sub_fastq.fq.gz"
barcodes = "path of your input cell barcode whitelist"
gtf_path = "path of your gtf annotation"
genome_path = "path of your genome annotation"
minimap_bed_path = "path of your bed annotation for minimap2, can be generated from gtf" //unnecessary
genome_name = "the genome name used for mapping, ex. hg38"
toolkit = your 10X sequencing toolkit
work_dir = "The output directory"

# create annotation
init(work_dir)
refer = annotation(gtf_path = gtf_path, work_dir = work_dir ,overwrite = FALSE)
gene_bed = refer[[1]]
gtf = refer[[2]]

# barcode match
plan(strategy = "multisession",workers = 4)
reads_bc = reads_extract_bc(fastq_path = fastq, barcode_path = barcodes , gene_bed = gene_bed,
                                              genome_path = genome_path, genome_name = genome_name, toolkit = toolkit,
                                              minimap_bed_path = minimap_bed_path, work_dir = work_dir, adapter = NULL)

Then in shell you can concat the output from the barcode match step and read it in R as the input for the UMI deduplication step.

### In R
reads_bc = read.table(file.path(work_dir,"BarcodeMatch/BarcodeMatch.txt"),header = TRUE,sep = "\t")
qual = read.table(file.path(work_dir,"BarcodeMatch/adapterNeedle.txt"), header = TRUE, sep = "\t")

plan(strategy = "multisession",workers = 4)
umi_count_corres(data = reads_bc,qual = qual,dir = file.path(work_dir,"/out/"),gene_bed = gene_bed,gtf = gtf)

Hope above steps could help with the memory problem.

Best

Sorry, I found a typo that I missed a s in the parameter barcode_path = barcodes in reads_extract_bc, I tested the seperate orders and it worked on my side.

yuanwsy commented 7 months ago

after I update the Longcellpre package,I met a new error,and all of my inputs hava never changed: Start to do barcode match:In total 0 reads out of 0 reads are identified with a valid adapter. Within them, 0 reads are identified with a valid tag region. Error in names(x) <- value : 'names' attribute [2] must be the same length as the vector [1]

It is weird,do you know why?

yuntianf commented 7 months ago

after I update the Longcellpre package,I met a new error,and all of my inputs hava never changed: Start to do barcode match:In total 0 reads out of 0 reads are identified with a valid adapter. Within them, 0 reads are identified with a valid tag region. Error in names(x) <- value : 'names' attribute [2] must be the same length as the vector [1]

It is weird,do you know why?

It's hard to trace the bug based on only the error message, could you please show me the code you run LongcellPre?

yuanwsy commented 7 months ago

library(future) library(Longcellsrc) library(LongcellPre) fastq = "/data/fulllength.fastq.gz" barcodes = "/data/barcodes.tsv.gz" gtf_path = "/data/gencode.vM33.basic.annotation.gtf.gz" genome_path = "/data/GRCm39.genome.fa.gz" genome_name = "mm39" toolkit = "3" work_dir = "/data/fulloutput/" samtools = "/usr/bin/samtools" bedtools = "~/miniconda3/envs/R4.1.2/bin/bedtools" minimap2 = "~/miniconda3/envs/R4.1.2/bin/minimap2" RunLongcellPre(fastq = fastq,barcode_path = barcodes,toolkit = toolkit, genome_path = genome_path,genome_name = genome_name, gtf_path = gtf_path,work_dir = work_dir, samtools = samtools, minimap2 = minimap2,bedtools = bedtools,cores = 32, strategy="multicore")

all of the code for running is above,thanks

yuntianf commented 7 months ago

This is weird, I only did some modifications in the NAMESPACE, which shouldn't influence the barcode match step. And I couldn't reproduce your error on my side with my test data. If you could share a small subset of your fastq file and also the cell barcode and gene_bed.rds to me, I could do some further diagnosis.