sr320 / course-fish546-2015

Bioinformatics for Environmental Sciences
https://github.com/sr320/fish546-2015/wiki
4 stars 8 forks source link

[quiz] Help me please. #35

Closed sr320 closed 9 years ago

sr320 commented 9 years ago

What is one thing you are having trouble with, that if solved would have the greatest impact on your analysis workflow?

kubu4 commented 9 years ago

I can't figure out how to annotate my files. Assume I don't provide the Ensembl GTF file when processing with TopHat (I could do this, but I'm trying to avoid re-running everything I've already done), how can I use downstream GTF files that are generated by cufflinks and/or cuffmerge for annotation?

I'm looking through BEDtools, but can't figure out how to proceed.

sr320 commented 9 years ago

Can you provide a url to said gtf?

— Steven Roberts

On Sun, Feb 8, 2015 at 10:54 AM, kubu4 notifications@github.com wrote:

I can't figure out how to annotate my files. Assume I don't provide the Ensembl GTF file when processing with TopHat (I could do this, but I'm trying to avoid re-running everything I've already done), how can I use downstream GTF files that are generated by cufflinks and/or cuffmerge for annotation?

I'm looking through BEDtools, but can't figure out how to proceed.

Reply to this email directly or view it on GitHub: https://github.com/sr320/fish546-2015/issues/35#issuecomment-73425172

kubu4 commented 9 years ago

I'm not even sure which GTF to use. What do I do with them?

Here're the GTFs my workflow has generated:

https://github.com/kubu4/fish546_2015/blob/master/gigasHSrnaSeq/analysis/cufflinks_preHS/transcripts.gtf

https://github.com/kubu4/fish546_2015/blob/master/gigasHSrnaSeq/analysis/cufflinks_postHS/transcripts.gtf

https://github.com/kubu4/fish546_2015/blob/master/gigasHSrnaSeq/analysis/cuffmerge/merged_asm/merged.gtf

sr320 commented 9 years ago

@kubu4 here is a start

!/Applications/bedtools2/bin/intersectBed \
-a ./data/gtf/merged.gtf \
-b ./data/gtf/Crassostrea_gigas.GCA_000297895.1.22.gtf \
-loj \
> ./analyses/sw-inter_ensemble.tab

This will find all annotated feature in ensemble file that intersect.

-loj    Perform a "left outer join". That is, for each feature in A
        report each overlap with B.  If no overlaps are found, 
        report a NULL feature for B.

This will get you just the extra column with CGI numbers (18) !awk -F'\t' '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$18}' ../analyses/sw-inter_ensemble.tab


The other cool thing would be to get those feature that do not overlap. using

-v  Only report those entries in A that have _no overlaps_ with B.
        - Similar to "grep -v" (an homage).

Then could used getfasta to pull novel sequence to blast.

kubu4 commented 9 years ago

Thanks. Is there any way to do this without the Ensembl GTF?

sr320 commented 9 years ago

Just use getfasta and blast.

sr320 commented 9 years ago

@kubu4 what kind of expression data do you have for each of the features in the merged gtf?

kubu4 commented 9 years ago

OK, I'll give the getfasta and blast a whirl.

Unfortunately, I don't understand the next question. According to the Cufflinks documentation, the Cuffmerge GTF is a "master" transcriptome assembly and is required for differential expression analysis.

kubu4 commented 9 years ago

Remember when you had problems getting BEDtools installed? Do you know what you did to fix the issue? I'm having the same problem after I run "make" on Hummingbird. It's as though bedtools isn't installed, even when I'm trying to run it within the bedtools/bin directory (which is where the actual bedtools script file is located).

kubu4 commented 9 years ago

Got it working. Did two things, but I'm not sure which one triggered it to work.

  1. Fresh install, but used "make all" instead of just "make"
  2. Although I did that, still couldn't launch bedtools. Tried "sudo bedtools" and that worked. Now, bedtools is usable, even without running sudo.
sr320 commented 9 years ago

That code above was run on hummingbird

kubu4 commented 9 years ago

I figured it might have been. But, I noticed it was in your usr/Applications directory and wanted to get a global install configured so that everyone can use it.

sr320 commented 9 years ago

It is in /Applications (not usr) so I assumed it would work across users. It didn't?

jheare commented 9 years ago

I really need help creating a tab delimited file for STACKS. Right now I can't specify sample names or which population they are from because Tab Delimited Files will not work with STACKS as described.

sr320 commented 9 years ago

@jheare url for tab file that is not working?

jheare commented 9 years ago

The two versions I've used with no success are:

version 1 version 2 and this original version

None of these have worked for STACKS even though they seems to be in the

sr320 commented 9 years ago

@jheare problem is you have space before ID.

Not sure how you obtained first file - but now cut each column - get rid of any spaces, paste together and voila.

Process-rad-TABS_and_data-genomic_and_git-repos_and_bash_-_What_is_a_unix_command_for_deleting_the_first_N_characters_of_a_line__-_Stack_Overflow_1A893A6C.png

Though in theory a simple sed to get rid off all spaces should work.

jheare commented 9 years ago

@sr320 which file are you using? I've tried this on the original and the two other files listed. None of them work with process_radtags still.

sr320 commented 9 years ago

Give me a url to a notebook where you tried to remove space and create new barcode tag file (and it still failed).

Also I would not get hung up on this, this week and make sure product is complete, with data you ran through last week.

Steven Roberts

On Tue, Feb 10, 2015 at 9:15 AM, Jake Heare notifications@github.com wrote:

@sr320 https://github.com/sr320 which file are you using? I've tried this on the original and the two other files listed. None of them work with process_radtags still.

— Reply to this email directly or view it on GitHub https://github.com/sr320/fish546-2015/issues/35#issuecomment-73740709.

jheare commented 9 years ago

I've figured out what you did with the file but I'm having hard time replicating it. How did you cut each column into its own file?

Also I can't complete the product without the files. STACKS will create output files without the sample names and without the population map but it is essentially meaningless as STACKS can't create population catalogs or statistics due to missing information.

sr320 commented 9 years ago

To cut a column from a file

!cut -f3 ./data/yourfile

-f3 cuts field 3

-f FIELD-LIST
--fields=FIELD-LIST
     Print only the fields listed in FIELD-LIST.  Fields are separated
     by a TAB character by default.
sr320 commented 9 years ago

@jheare re product - Just go through each command - process-rad tags - the three "stacks" and describe what each one does and describe each data product in detail - that arises from these commands. This includes all the stats that are output as standard error when running.

In other words - do not take it through creating population catalogs.

jheare commented 9 years ago

Here is what I tried to do to replicate the way you produced the barcode file. This did not produce a working file as I still get the error saying the barcodes are invalid.

I'll keep writing up the product draft based on you suggestions here.

jheare commented 9 years ago

@sr320 I just realized I made an error while running STACKS and fixed it in my notebook. Now it can actually run through to the populations command. It produces statistics and VCF files, not that they are useful now but should be useful in the future.

willking2 commented 9 years ago

"Will, put something here so that you get credit for it." -Dr. Steven Roberts

jldimond commented 9 years ago

Joining datasets has been the toughest for me. I've had trouble using the bash join command, so I spent a little bit of time trying to get SQLite to run in iPython. But then I realized that SQLite needs your files to be SQL databases, and that seemed like a hassle (though maybe it isn't). So then I just reverted to using SQLshare. This has generally worked except for one day when it was running like molasses in Antarctica.

But I think I am starting to understand how I can better use bash join, thanks to your examples. At some point I hope to incorporate these into my workflow.

I should also mention that I also spent a ridiculous amount of time dealing with Git a few weeks ago, but I now seem to have things under control.

Meanwhile, I'm doing most of this stuff on my new Linux OS, which has taken a bit of getting used to. Sometimes I do get jealous of you Mac users.

lisa418 commented 9 years ago

I'm having trouble downloading/moving files around in GitHub (i.e. downloading from your repository and adding to mine). I also seem to have lost my raw sequencing files :( There's a link to them in my notebook but I can't find them in my GitHub data folder anymore.