wtsi-hgi / ebi-ngs-workshop

HGI's materials for the EBI Next Generation Sequencing (NGS) workshop
1 stars 0 forks source link

Students encounter "path confusion" #5

Open colin-nolan opened 8 years ago

colin-nolan commented 8 years ago

The most frequent issue I observed yesterday was that of "path confusion". Numerous students ran commands with invalid paths (e.g. wrong number of "../", misspellings, incorrect location for absolute path used), resulting in error messages that confused and/or zero length output files that were assumed to be the desired output. These corrupt output files became the source of much puzzlement when they caused issues in later steps.

It would be useful to students if we could reduce the "path confusion" they experience.

colin-nolan commented 8 years ago

Possible aids:

One student suggested that the obvious solution to making things less confusing was to just copy the reference into every directory :P.

Xophmeister commented 8 years ago

You could symlink the reference into every directory and wrap that up in the tar file

jrandall commented 8 years ago

I don't think copying the reference (or even linking it) would be reasonable. Learning how to work in a manner which does not result in a needless proliferation of large data files is an important aspect of the work. Perhaps we should be more explicit about the concept of referring to files outside of the working directory. A diagram could perhaps help with that.

I fear employing your other suggestion, using environment variables to point to the various paths could lead to even greater confusion. It might be worth a try next time around to see how it goes, but if we don't explain what they are doing I fear they will lose the plot and have no idea what is actually happening.

Some of the people with path problems had also overinterpreted how to expand the tar file, and had ended up changing to the shared penelope folder and expanded it there instead of in their home directory. That resulted in extra confusion later as several students were then working in the same shared folder and so they were constantly overwriting each other's work. I've already asked the IT guy if he can mount penelope read-only from the student VMs. I don't think this course has any need to write to the shared drive, so that should fix that problem (or in any case, it should fail early).

colin-nolan commented 8 years ago

@Xophmeister I don't suppose that would work as the tools state that they require the reference "prefix", suggesting that they also exploit the files produced when the reference is indexed.

colin-nolan commented 8 years ago

@jrandall:

I fear employing your other suggestion, using environment variables to point to the various paths could lead to even greater confusion. It might be worth a try next time around to see how it goes, but if we don't explain what they are doing I fear they will lose the plot and have no idea what is actually happening.

Yes, I also suspect that may be the consequence; it would make things more "magical" and therefore probably more confusing.

Some of the people with path problems had also overinterpreted how to expand the tar file, and had ended up changing to the shared penelope folder and expanded it there instead of in their home directory. That resulted in extra confusion later as several students were then working in the same shared folder and so they were constantly overwriting each other's work. I've already asked the IT guy if he can mount penelope read-only from the student VMs. I don't think this course has any need to write to the shared drive, so that should fix that problem (or in any case, it should fail early).

Given the materials have been sorted a bit more now we should be ready in time to get our files baked into the home directory of the VM next time.