Closed ixxmu closed 5 months ago
1.初步质控看看结果如何
conda activate rna2
mkdir -p fastqc_output
nohup fastqc -t 6 -o fastqc_output CRR*/*.fq.gz &
2.使用multiqc合并质控结果
multiqc ./
multiqc的主要参数
--force -f Overwrite any existing reports │
│ --config -c Specific config file to load, after those in MultiQC dir / home dir / working │
│ dir. │
│ (PATH) │
│ --cl-config Specify MultiQC config YAML on the command line (TEXT) │
│ --filename -n Report filename. Use 'stdout' to print to standard out. (TEXT) │
│ --outdir -o Create report in the specified output directory. (TEXT) │
│ --ignore -x Ignore analysis files (GLOB EXPRESSION) │
│ --ignore-samples Ignore sample names (GLOB EXPRESSION) │
│ --ignore-symlinks Ignore symlinked directories and files │
│ --file-list -l Supply a file containing a list of file paths to be searched, one per row
ls | grep '^CRR' >sample.id
#批量获取样本名称
find ../CRA005925/ -mindepth 1 -type d|cut -d"/" -f3 >sample.id
conda activate rna2
#!/bin/bash
cat sample.id | while read id
do
mkdir ~/silicosis_wangchen/data/02_trimgalory/${id}
echo "${id}"
echo "++++++++===========================+++++++++++"
trim_galore --phred33 -q 20 --length 36 \
--stringency 3 --fastqc --paired --max_n 3 \
-o ~/silicosis_wangchen/data/02_trimgalory/${id} \
~/silicosis_wangchen/data/CRA005925/${id}/${id}_f1.fq.gz \
~/silicosis_wangchen/data/CRA005925/${id}/${id}_f2.fq.gz
done
chmod +x batch_trimgalore.sh
nohup bash batch_trimgalore.sh >trim_galore.log 2>&1 &
trimgalory 详细参数
-q/--quality <INT> Trim low-quality ends from reads in addition to adapter removal. For
RRBS samples, quality trimming will be performed first, and adapter
trimming is carried in a second round. Other files are quality and adapter
trimmed in a single pass. The algorithm is the same as the one used by BWA
(Subtract INT from all qualities; compute partial sums from all indices
to the end of the sequence; cut sequence at the index at which the sum is
minimal). Default Phred score: 20.
--phred33 Instructs Cutadapt to use ASCII+33 quality scores as Phred scores
(Sanger/Illumina 1.9+ encoding) for quality trimming. Default: ON.
--phred64 Instructs Cutadapt to use ASCII+64 quality scores as Phred scores
(Illumina 1.5 encoding) for quality trimming.
--fastqc Run FastQC in the default mode on the FastQ file once trimming is complete.
--fastqc_args "<ARGS>" Passes extra arguments to FastQC. If more than one argument is to be passed
to FastQC they must be in the form "arg1 arg2 etc.". An example would be:
--fastqc_args "--nogroup --outdir /home/". Passing extra arguments will
automatically invoke FastQC, so --fastqc does not have to be specified
separately.
-a/--adapter <STRING> Adapter sequence to be trimmed. If not specified explicitly, Trim Galore will
try to auto-detect whether the Illumina universal, Nextera transposase or Illumina
small RNA adapter sequence was used. Also see '--illumina', '--nextera' and
'--small_rna'. If no adapter can be detected within the first 1 million sequences
of the first file specified or if there is a tie between several adapter sequences,
Trim Galore defaults to '--illumina' (as long as the Illumina adapter was one of the
options, else '--nextera' is the default). A single base
may also be given as e.g. -a A{10}, to be expanded to -a AAAAAAAAAA.
At a special request, multiple adapters can also be specified like so:
-a " AGCTCCCG -a TTTCATTATAT -a TTTATTCGGATTTAT"
-a2 " AGCTAGCG -a TCTCTTATAT -a TTTCGGATTTAT", or so:
-a "file:../multiple_adapters.fa"
-a2 "file:../different_adapters.fa"
Potentially in conjucntion with the parameter "-n 3" to trim all adapters. Please note
that this is NOT needed for standard trimming!
More Information here: https://github.com/FelixKrueger/TrimGalore/issues/86
--max_length <INT> Discard reads that are longer than <INT> bp after trimming. This is only advised for
smallRNA sequencing to remove non-small RNA sequences.
--stringency <INT> Overlap with adapter sequence required to trim a sequence. Defaults to a
very stringent setting of 1, i.e. even a single bp of overlapping sequence
will be trimmed off from the 3' end of any read.
--length <INT> Discard reads that became shorter than length INT because of either
quality or adapter trimming. A value of '0' effectively disables
this behaviour. Default: 20 bp.
For paired-end files, both reads of a read-pair need to be longer than
<INT> bp to be printed out to validated paired-end files (see option --paired).
If only one read became too short there is the possibility of keeping such
unpaired single-end reads (see --retain_unpaired). Default pair-cutoff: 20 bp.
--max_n COUNT The total number of Ns a read may contain before it will be removed altogether.
In a paired-end setting, either read exceeding this limit will result in the entire
pair being removed from the trimmed output files. If COUNT is a number between 0 and 1,
it is interpreted as a fraction of the read length.
在Linux中,你可以使用ls
命令与通配符来获取当前目录下所有以"CRR"开头的文件名称。具体命令如下:
ls CRR*
此命令会列出当前目录中所有以"CRR"开头的文件和目录。
如果你只想列出文件名,可以结合grep和ls命令来过滤:
ls | grep '^CRR'
此命令会列出当前目录中所有以"CRR"开头的文件和目录名称。
如果你只想要文件而不包含目录,可以使用find命令:
find . -maxdepth 1 -type f -name 'CRR*'
此命令会列出当前目录(不包括子目录)中所有以"CRR"开头的文件。
4.Hista2比对
cat >map.sh
#!/bin/bash
#获取样本名称,其实就是每个fastq文件的前缀
ls ../CRA005925/ |grep '^CRR' |while read i
do
echo ${i}
echo "==========*****************======="
hisat2 -x ~/20240125_rnaseq/data/4086673388_YoungLeeyyL/_/index/grcm38/genome -p 20 --summary-file ./${i}.mapinfo.txt \
-1 ~/silicosis_wangchen/data/CRA005925/${i}/${i}_f1.fq.gz \
-2 ~/silicosis_wangchen/data/CRA005925/${i}/${i}_r2.fq.gz \
|samtools sort -@ 20 -o ${i}.sorted.bam
done
chmod +x map.sh
nohup bash map.sh >map.log 2>&1 &
重要参数
--phred33 qualities are Phred+33 (default)
--phred64 qualities are Phred+64
-p/--threads <int> number of alignment threads to launch (1)
Usage:
hisat2 [options]* -x <ht2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]
<ht2-idx> Index filename prefix (minus trailing .X.ht2).
<m1> Files with #1 mates, paired with files in <m2>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<m2> Files with #2 mates, paired with files in <m1>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<r> Files with unpaired reads.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<sam> File for SAM output (default: stdout)
<m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be
specified many times. E.g. '-U file1.fq,file2.fq -U file3.fq'.
cat >count.sh
#!/bin/bash
featureCounts -a ~/20240125_rnaseq/data/4086673388_YoungLeeyyL/_/index/Mus_musculus.GRCm38.102.gtf \
--extraAttributes gene_name,gene_biotype -T 60 \
-t exon -p -o ./all_samples_counts.txt ~/silicosis_wangchen/data/03map-data/*.bam
重要参数:
Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...
## Mandatory arguments:
-a <string> Name of an annotation file. GTF/GFF format by default. See
-F option for more format information. Inbuilt annotations
(SAF format) is available in 'annotation' directory of the
package. Gzipped file is also accepted.
-o <string> Name of output file including read counts. A separate file
including summary statistics of counting results is also
included in the output ('<string>.summary'). Both files
are in tab delimited format.
input_file1 [input_file2] ... A list of SAM or BAM format files. They can be
either name or location sorted. If no files provided,
<stdin> input is expected. Location-sorted paired-end reads are automatically sorted by read names.
-T <int> Number of the threads. 1 by default.
# Parameters specific to paired end reads
-p If specified, libraries are assumed to contain paired-end
reads. For any library that contains paired-end reads, the
'countReadPairs' parameter controls if read pairs or reads
should be counted.
不可查看链接:
马拉松课程
https://mp.csdn.net/mp_blog/creation/editor/135890652
jimmy2022
https://mp.csdn.net/mp_blog/creation/editor/128177352
https://mp.weixin.qq.com/s/uVMKL1Us4YWDVen2JB4kGQ