Open Linda-Lan opened 6 years ago
This doesn't look like a problem in velocyto but rather a problem of its dependencies pysam/samtools. It could also be that the file is corrupted or something like that. Does 'samtools view BAMFILE' work? Are you using conda?
Hi gioelelm,
I then realized I need to delete the old bam files since it seems velocyto will skip this step if old bam files exist. I re-run and it successfully generate .loom file. I run the analysis according to python tutorial. It shows error as the following. What is ClusterName I need to put? Also, do you docker image on DockerHub? Is it possible to have analysis pipeline for R studio?
[lindalan@midway-login1 velocyto]$ python analysis.py
/home/lindalan/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "analysis.py", line 14, in
Good to hear that it worked, sorry if I didn't point out that the problem could be a previously corrupted sorted bam file.
You can pass any vector of labels or cluster name. That line of code assumes that you performed some clustering step and you stored the result in that column attribute, in the loom file. But you can substitute the column attribute with any numpy array, the info doesn't need to be stored in the loom file and can come from previous analyses.
Thank you for your prompt reply. Does anything in this script that may not necessary or cause error:
import os import velocyto as vcy from sklearn.manifold import TSNE
vlm = vcy.VelocytoLoom("319-5_prime.loom") vlm.normalize("S", size=True, log=True) vlm.S_norm vlm.plot_fractions() vlm.dump_hdf5("my_velocyto_analysis")
vlm.filter_cells(bool_array=vlm.initial_Ucell_size > np.percentile(vlm.initial_Ucell_size, 0.5)) vlm.set_clusters(vlm.ca["ClusterName"]) vlm.score_detection_levels(min_expr_counts=40, min_cells_express=30) vlm.filter_genes(by_detection_levels=True) vlm.score_cv_vs_mean(3000, plot=True, max_expr_avg=35) vlm.filter_genes(by_cv_vs_mean=True)
vlm._normalize_S(relative_size=vlm.S.sum(0), target_size=vlm.S.sum(0).mean()) vlm._normalize_U(relative_size=vlm.U.sum(0), target_size=vlm.U.sum(0).mean())
vlm.perform_PCA() vlm.knn_imputation(n_pca_dims=20, k=500, balanced=True, b_sight=3000, b_maxl=1500, n_jobs=16)
vlm.fit_gammas()
vlm.plot_phase_portraits(["Igfbpl1", "Pdgfra"])
vlm.predict_U() vlm.calculate_velocity() vlm.calculate_shift(assumption="constant_velocity") vlm.extrapolate_cell_at_t(delta_t=1.) vlm.calculate_shift(assumption="constant_unspliced", delta_t=10) vlm.extrapolate_cell_at_t(delta_t=1.)
bh_tsne = TSNE() vlm.ts = bh_tsne.fit_transform(vlm.pcs[:, :25]) vlm.estimate_transition_prob(hidim="Sx_sz", embed="ts", transform="sqrt", psc=1, n_neighbors=3500, knn_random=True, sampled_fraction=0.5) vlm.calculate_embedding_shift(sigma_corr = 0.05, expression_scaling=True)
vlm.calculate_grid_arrows(smooth=0.8, steps=(40, 40), n_neighbors=300) plt.figure(None,(20,10)) vlm.plot_grid_arrows(quiver_scale=0.6, scatter_kwargs_dict={"alpha":0.35, "lw":0.35, "edgecolor":"0.4", "s":38, "rasterized":True}, min_mass=24, angles='xy', scale_units='xy', headaxislength=2.75, headlength=5, headwidth=4.8, minlength=1.5, plot_random=True, scale_type="absolute")
Yes, beyond the line we discussed also the fact that the parameters are somehow assuming a dataset of the same size of the dentate gyrus one
Hi gioelelm,
I then realized I need to delete the old bam files since it seems velocyto will skip this step if old bam files exist. I re-run and it successfully generate .loom file. I run the analysis according to python tutorial. It shows error as the following. What is ClusterName I need to put? Also, do you docker image on DockerHub? Is it possible to have analysis pipeline for R studio?
[lindalan@midway-login1 velocyto]$ python analysis.py /home/lindalan/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from
float
tonp.floating
is deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type
. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "analysis.py", line 14, in vlm.set_clusters(vlm.ca["ClusterName"]) KeyError: 'ClusterName'
sorry,so you mean delete the bam file?(possorted_genome_bam.bam)
Hi guys, how do you solve the matter about the no EOF marker ; file may be truncated? I deleted the old files but it didnt work.
Best
Hi velocyto team,
I ran 10x cell ranger count output files with the following commend. But no loom. file that supposed to be generated.
velocyto run10x /project/wilsonp/linda/319-5_prime /project/wilsonp/linda/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf
It shows error: [lindalan@midway-login2 sbatch]$ vim velocyto.slurm.e49102211
/home/lindalan/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from
sys.exit(cli())
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(args, kwargs)
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(args, **kwargs)
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/velocyto/commands/run10x.py", line 106, in run10x
samtools_memory=samtools_memory, dump=dump, verbose=verbose, additional_ca=additional_ca)
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/velocyto/commands/_run.py", line 229, in _run
results = exincounter.count(bamfile_cellsorted, multimap=multimap) # NOTE: we would avoid some millions of if statements evalution if we write two function count and count_with output
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/velocyto/counter.py", line 754, in count
for r in self.iter_alignments(bamfile, unique=not multimap):
File "/home/lindalan/anaconda3/lib/python3.6/site-packages/velocyto/counter.py", line 249, in iter_alignments
fin = pysam.AlignmentFile(bamfile) # type: pysam.AlignmentFile
File "pysam/libcalignmentfile.pyx", line 734, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 944, in pysam.libcalignmentfile.AlignmentFile._open
File "pysam/libchtslib.pyx", line 366, in pysam.libchtslib.HTSFile.check_truncation
OSError: no BGZF EOF marker; file may be truncated
float
tonp.floating
is deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type
. from ._conv import register_converters as _register_converters Traceback (most recent call last): File "/home/lindalan/anaconda3/bin/velocyto", line 11, inHere is the file generated so far:
[lindalan@midway-login2 319-5_prime]$ cd outs [lindalan@midway-login2 outs]$ ls analysis cellsorted_possorted_genome_bam.bam.tmp.0036.bam cellsorted_possorted_genome_bam.bam cellsorted_possorted_genome_bam.bam.tmp.0037.bam cellsorted_possorted_genome_bam.bam.tmp.0000.bam cellsorted_possorted_genome_bam.bam.tmp.0038.bam cellsorted_possorted_genome_bam.bam.tmp.0001.bam cellsorted_possorted_genome_bam.bam.tmp.0039.bam cellsorted_possorted_genome_bam.bam.tmp.0002.bam cellsorted_possorted_genome_bam.bam.tmp.0040.bam cellsorted_possorted_genome_bam.bam.tmp.0003.bam cellsorted_possorted_genome_bam.bam.tmp.0041.bam cellsorted_possorted_genome_bam.bam.tmp.0004.bam cellsorted_possorted_genome_bam.bam.tmp.0042.bam cellsorted_possorted_genome_bam.bam.tmp.0005.bam cellsorted_possorted_genome_bam.bam.tmp.0043.bam cellsorted_possorted_genome_bam.bam.tmp.0006.bam cellsorted_possorted_genome_bam.bam.tmp.0044.bam cellsorted_possorted_genome_bam.bam.tmp.0007.bam cellsorted_possorted_genome_bam.bam.tmp.0045.bam cellsorted_possorted_genome_bam.bam.tmp.0008.bam cellsorted_possorted_genome_bam.bam.tmp.0046.bam cellsorted_possorted_genome_bam.bam.tmp.0009.bam cellsorted_possorted_genome_bam.bam.tmp.0047.bam cellsorted_possorted_genome_bam.bam.tmp.0010.bam cellsorted_possorted_genome_bam.bam.tmp.0048.bam cellsorted_possorted_genome_bam.bam.tmp.0011.bam cellsorted_possorted_genome_bam.bam.tmp.0049.bam cellsorted_possorted_genome_bam.bam.tmp.0012.bam cellsorted_possorted_genome_bam.bam.tmp.0050.bam cellsorted_possorted_genome_bam.bam.tmp.0013.bam cellsorted_possorted_genome_bam.bam.tmp.0051.bam cellsorted_possorted_genome_bam.bam.tmp.0014.bam cellsorted_possorted_genome_bam.bam.tmp.0052.bam cellsorted_possorted_genome_bam.bam.tmp.0015.bam cellsorted_possorted_genome_bam.bam.tmp.0053.bam cellsorted_possorted_genome_bam.bam.tmp.0016.bam cellsorted_possorted_genome_bam.bam.tmp.0054.bam cellsorted_possorted_genome_bam.bam.tmp.0017.bam cellsorted_possorted_genome_bam.bam.tmp.0055.bam cellsorted_possorted_genome_bam.bam.tmp.0018.bam cellsorted_possorted_genome_bam.bam.tmp.0056.bam cellsorted_possorted_genome_bam.bam.tmp.0019.bam cellsorted_possorted_genome_bam.bam.tmp.0057.bam cellsorted_possorted_genome_bam.bam.tmp.0020.bam cellsorted_possorted_genome_bam.bam.tmp.0058.bam cellsorted_possorted_genome_bam.bam.tmp.0021.bam cellsorted_possorted_genome_bam.bam.tmp.0059.bam cellsorted_possorted_genome_bam.bam.tmp.0022.bam cellsorted_possorted_genome_bam.bam.tmp.0060.bam cellsorted_possorted_genome_bam.bam.tmp.0023.bam cellsorted_possorted_genome_bam.bam.tmp.0061.bam cellsorted_possorted_genome_bam.bam.tmp.0024.bam cellsorted_possorted_genome_bam.bam.tmp.0062.bam cellsorted_possorted_genome_bam.bam.tmp.0025.bam cellsorted_possorted_genome_bam.bam.tmp.0063.bam cellsorted_possorted_genome_bam.bam.tmp.0026.bam cloupe.cloupe cellsorted_possorted_genome_bam.bam.tmp.0027.bam filtered_gene_bc_matrices cellsorted_possorted_genome_bam.bam.tmp.0028.bam filtered_gene_bc_matrices_h5.h5 cellsorted_possorted_genome_bam.bam.tmp.0029.bam metrics_summary.csv cellsorted_possorted_genome_bam.bam.tmp.0030.bam molecule_info.h5 cellsorted_possorted_genome_bam.bam.tmp.0031.bam possorted_genome_bam.bam cellsorted_possorted_genome_bam.bam.tmp.0032.bam possorted_genome_bam.bam.bai cellsorted_possorted_genome_bam.bam.tmp.0033.bam raw_gene_bc_matrices cellsorted_possorted_genome_bam.bam.tmp.0034.bam raw_gene_bc_matrices_h5.h5 cellsorted_possorted_genome_bam.bam.tmp.0035.bam web_summary.html
Do you have any solutions? Thank you!