steineggerlab / ufcg

UFCG: Universal Fungal Core Genes
https://ufcg.steineggerlab.com
GNU General Public License v3.0
29 stars 0 forks source link

Metadata file checking fails when .tsv is symbolic link #28

Open jackscanlan opened 2 weeks ago

jackscanlan commented 2 weeks ago

Hi, this is less of a bug and more of an unexpected behaviour! Posting here so others who might have had this issue know a fix.

I'm using ufcg with the v1.0.5 container in Nextflow. Nextflow executes each software process in a separate work directory and passes input files into this directory via symbolic links, rather than copying them over. For my purposes, I was supplying the metadata.tsv file via an input channel, which meant it was a symbolic link, and ufcg wasn't liking it:

      __  __ _____ _____ _____
     / / / // ___// ___// ___/
    / / / // /_  / /   / / __
   / /_/ // __/ / /___/ /_/ /
   \____//_/    \____/\____/ v1.0.5

  [JUN 24 00:54:18] UFCG  |:  Verbose option check.
  [JUN 24 00:54:18] UFCG  |:  Timestamp printing option check.
  [JUN 24 00:54:18] UFCG  |:  Input file check : GCA_013839505.1_ASM1383950v1_genomic.fna
  [JUN 24 00:54:18] UFCG  |:  Input argument : symbolic link to /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/a4/656055c898c37762af308a88ba3c0e/GCA_013839505.1_ASM1383950v1_genomic.fna
  [JUN 24 00:54:18] UFCG  |:  Output directory check : .
  [JUN 24 00:54:18] UFCG  |:  Temporary directory check : /tmp/GCA_013839505.1
  [JUN 24 00:54:18] UFCG  |:  Custom CPU thread count check : 1
  [JUN 24 00:54:18] UFCG  |:  Metadata file check : repository_metadata.tsv
  [JUN 24 00:54:18] UFCG  |:  ERROR! Invalid file given : repository_metadata.tsv
  [JUN 24 00:54:18] UFCG  |:  Run with "profile -h" option to see the user manual.

The fix is to use the readlink command in bash to convert the symbolic path (here repository_metadata.tsv) to an absolute path:

META_PATH=$(readlink repository_metadata.tsv -fn)

...and then use this as the metadata file path in ufcg:

ufcg profile -i ./input -o ./output -m $META_PATH

Because the fix is so easy in bash, not sure if it's worth trying to fix this within ufcg, but might be worth adding a note to the documentation? Although it is strange to me that ufcg recognises and allows input files to be symbolic links (see output above).

jackscanlan commented 2 weeks ago

Also, because this is such a minor thing, I didn't think it was worth its own issue: the manual page currently shows the wrong flag (-c not -t) for setting CPU threads in ufcg profile.

endixk commented 1 week ago

Hi, thank you for pointing this out!

This is indeed unexpected, I've never tested with symbolic files as an input. I don't think it will be too difficult to write a code that tracks down the actual file if a symlink is given, I'll try to implement this and release an update soon.

I also fixed the typo from the manual page. Thanks!