ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

Issue with using get-fasta command #87

Closed ShenMJ99 closed 1 week ago

ShenMJ99 commented 2 weeks ago

I found the README on https://github.com/ncbi/fcs-gx/tree/release which mentions Useful GX subcommands. I am interested in using the get-fasta functionality described there. However, I noticed that the path ./dist/gx, which is supposed to point to ../build/src/gx, does not exist.

The README describes the functionality as follows:

"The sequences used to build the gx database are listed in the file all.seq_info.tsv.gz within the gxdb folder. From there, you can select the sequences of your choice, and then generate the fasta files using the gx get-fasta subcommand: ./dist/gx get-fasta --db /dev/shm/gxdb/all.gxi --input 3col.txt --output out.txt The input file, which is provided by the user, is a tab delimited, 3 column file in the following format, along with the header:

cat 3col.txt

[["GX locs",1,1]]

NC_060925.1 . . To get the fasta for a specific set of coordinates, format your input file with the start and end coordinates in the 2nd and 3rd column, respectively:

[["GX locs",1,1]]

NC_060925.1 1 200"

Could you please provide guidance on how to implement this sub-functionality, considering the discrepancy in the path and how to correctly use the get-fasta command?

etvedte commented 2 weeks ago

The path should already exist when you git clone the repo. But prior to executing any GX commands, be sure to build with make first. otherwise it won't work. My guess is that you forgot to do that.

git clone https://github.com/ncbi/fcs-gx.git
cd fcs-gx

ls -l dist/gx
lrwxrwxrwx 1 tvedtees giindex 15 Jun 17 08:50 dist/gx -> ../build/src/gx

./dist/gx get-fasta --help
bash: ./dist/gx: No such file or directory

make
...
...
[100%] Built target gx

./dist/gx get-fasta --help
Fetch fasta from db.
Usage: ./dist/gx get-fasta [OPTIONS]

Options:
  -i,--input file [stdin]     3-column TSV of locs, same format as --hardmask in make-db mode.

  --gx-db file [db/all.gxi]   /dev/shm/path/to/gxdb/ produced by gx make-db.
                              Should be placed in RAM-disk.

  -o,--output file [stdout]   Fasta.

  -h,--help                   Print this help message and exit.

Eric

etvedte commented 1 week ago

Please re-open this issue if you have additional questions.

ShenMJ99 commented 2 days ago

Thank you very much for your previous answers. I forgot to express my gratitude immediately because I thought I might have more questions later.

I was able to obtain the fasta files from the database using gx get-fasta in conda.

Now, I am trying to use the main feature of fcs-gx, which is to clean up submitted genomic data. However, I encountered a problem when running make, which I am struggling to resolve.

System version: Rocky Linux release 8.7 (Green Obsidian) Software versions: gcc (GCC) 11.2.0 cmake version 3.14.0 Python 3.10.8 The detailed error is as follows: (base) [shenmj@master fcs-gx]$ make cmake -B build -DCMAKE_BUILD_TYPE=RELEASE -- The C compiler identification is GNU 11.2.0 -- The CXX compiler identification is GNU 11.2.0 -- Check for working C compiler: /home/apps/gcc/11.2.0/bin/gcc -- Check for working C compiler: /home/apps/gcc/11.2.0/bin/gcc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Check for working CXX compiler: /home/apps/gcc/11.2.0/bin/g++ -- Check for working CXX compiler: /home/apps/gcc/11.2.0/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Configuring done -- Generating done -- Build files have been written to: /home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build make -C build all make[1]: Entering directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' make[2]: Entering directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' make[3]: Entering directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' Scanning dependencies of target gx make[3]: Leaving directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' make[3]: Entering directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' [ 5%] Building CXX object src/CMakeFiles/gx.dir/GP_36950_compare_hmers_to_kmers.cpp.o [ 11%] Building CXX object src/CMakeFiles/gx.dir/align.cpp.o In file included from /home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/include/ext/json5.hpp:6, from /home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/src/align.cpp:38: /home/apps/gcc/11.2.0/include/c++/11.2.0/variant: In instantiation of ‘struct std::variant_size<const gx::json5::value_t>’: /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:1754:13: required from ‘constexpr std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> std::visit(_Visitor&&, _Variants&& ...) [with _Visitor = gx::json5::s_to_stream(std::ostream&, const gx::json5::value_t&)::<lambda(auto:30&&)>; _Variants = {const gx::json5::value_t&}; std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> = void]’ /home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/include/ext/json5.hpp:432:19: required from here /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:84:12: error: invalid use of incomplete type ‘struct std::variant_size<gx::json5::value_t>’ 84 | struct variant_size<const _Variant> : variant_size<_Variant> {}; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:81:12: note: declaration of ‘struct std::variant_size<gx::json5::value_t>’ 81 | struct variant_size; | ^~~~~~~~~~~~ /home/apps/gcc/11.2.0/include/c++/11.2.0/variant: In instantiation of ‘constexpr std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> std::visit(_Visitor&&, _Variants&& ...) [with _Visitor = gx::json5::s_to_stream(std::ostream&, const gx::json5::value_t&)::<lambda(auto:30&&)>; _Variants = {const gx::json5::value_t&}; std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> = void]’: /home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/include/ext/json5.hpp:432:19: required from here /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:1754:20: error: ‘value’ is not a member of ‘std::variant_size<const gx::json5::value_t>’ 1754 | std::make_index_sequence< | ^~~~~~~~~~~~~~~~~~~~ 1755 | std::variant_size<remove_reference_t<_Variants>...>::value>()); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:1758:29: error: non-constant condition for static assertion 1758 | static_assert(__visit_rettypes_match, | ^~~~~~~~~~~~~~~~~~~~~~ /home/apps/gcc/11.2.0/include/c++/11.2.0/variant: In instantiation of ‘constexpr const size_t std::variant_size_v<const gx::json5::value_t>’: /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:1049:10: [ skipping 2 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ] /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:1764:34: required from ‘constexpr std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> std::visit(_Visitor&&, _Variants&& ...) [with _Visitor = gx::json5::s_to_stream(std::ostream&, const gx::json5::value_t&)::<lambda(auto:30&&)>; _Variants = {const gx::json5::value_t&}; std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> = void]’ /home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/include/ext/json5.hpp:432:19: required from here /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:97:70: error: ‘value’ is not a member of ‘std::variant_size<const gx::json5::value_t>’ 97 | inline constexpr size_t variant_size_v = variant_size<_Variant>::value; | ^~~~~ /home/apps/gcc/11.2.0/include/c++/11.2.0/variant: In instantiation of ‘constexpr decltype(auto) std::__do_visit(_Visitor&&, _Variants&& ...) [with _Result_type = std::__detail::__variant::__deduce_visit_result<void>; _Visitor = gx::json5::s_to_stream(std::ostream&, const gx::json5::value_t&)::<lambda(auto:30&&)>; _Variants = {const gx::json5::value_t&}]’: /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:1764:34: required from ‘constexpr std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> std::visit(_Visitor&&, _Variants&& ...) [with _Visitor = gx::json5::s_to_stream(std::ostream&, const gx::json5::value_t&)::<lambda(auto:30&&)>; _Variants = {const gx::json5::value_t&}; std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> = void]’ /home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/include/ext/json5.hpp:432:19: required from here /home/apps/gcc/11.2.0/include/c++/11.2.0/variant:1731:52: error: ‘_S_vtable’ is not a member of ‘std::__detail::__variant::__gen_vtable<std::__detail::__variant::__deduce_visit_result<void>, gx::json5::s_to_stream(std::ostream&, const gx::json5::value_t&)::<lambda(auto:30&&)>&&, const gx::json5::value_t&>’ 1731 | _Result_type, _Visitor&&, _Variants&&...>::_S_vtable; | ^~~~~~~~~ make[3]: *** [src/CMakeFiles/gx.dir/build.make:76: src/CMakeFiles/gx.dir/align.cpp.o] Error 1 make[3]: Leaving directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' make[2]: *** [CMakeFiles/Makefile2:91: src/CMakeFiles/gx.dir/all] Error 2 make[2]: Leaving directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' make[1]: *** [Makefile:84: all] Error 2 make[1]: Leaving directory '/home/usr/shenmj/databasecheck/FCS-GX/fcs-gx/build' make: *** [Makefile:4: all] Error 2 Do you know how to solve this problem? Additionally, I found the ncbi-fcs-gx software in miniconda3 and tried using version 0.4.0 in conda. However, I found that in gx clean genome, --action-report file requires Action-report (*.fcs_gx_report.txt produced by run_gx). Is there currently no way to perform genome scanning and cleaning using only the gx tool in conda?

Thank you for taking the time to look into my issue!