pan-genome results vizualisation

fetyj commented 1 year ago

Hi, I'm trying to visualize my analysis but get an error when trying to start the local webserver with npx serve I got this error npx: installed 91 in 6.279s Unexpected token { I was unable to access my rawdata in the browser at http://localhost:8001 Could you help with this issue? Thanks

Fety

fetyj commented 1 year ago

After updating npm serve to 13.0.4 the server is loading and I can access all the files at http://0.0.0.0:8001 but when running npm run build I got this error:

> panX-visualization@2.0.0 build /home/fiestaj/pan-genome-visualization
> NODE_ENV=production npm run build:webpack && NODE_ENV=production npm run build:gulp && NODE_ENV=production npm run build:html

> panX-visualization@2.0.0 build:webpack /home/fiestaj/pan-genome-visualization
> NODE_ENV=production webpack --env production

Hash: f534ac85224596f5c797
Version: webpack 2.2.1
Time: 2380ms
           Asset     Size  Chunks                    Chunk Names
client_bundle.js  2.58 MB       0  [emitted]  [big]  client_bundle
     homepage.js  1.09 MB       1  [emitted]  [big]  homepage
   [0] ./~/d3/d3.js 337 kB {0} {1} [built]
   [1] ./public/javascripts/global.js 2.64 kB {0} [built]
   [3] ./~/jquery/dist/jquery.js 292 kB {0} {1} [built]
   [4] ./public/javascripts/tooltips.js 8.33 kB {0} [built]
   [5] ./~/datatables.net/js/jquery.dataTables.mjs 438 kB {0} {1} [built]
   [7] ./public/phyloTree/src/updateTree.js 7.29 kB {0} [built]
   [8] ./public/javascripts/data_path.js 1.17 kB {0} [built]
   [9] ./public/javascripts/tree-init.js 16 kB {0} [built]
  [39] ./public/javascripts/speciesTreeCallbacks.js 3.25 kB {0} [built]
  [46] ./public/javascripts/homepage.js 3.19 kB {1} [built]
  [47] ./public/javascripts/render_viewer.js 9.52 kB {0} [built]
 [118] ./public/javascripts/chartsAndClusterTable.js 18.5 kB {0} [built]
 [122] ./public/javascripts/speciesTree.js 2.54 kB {0} [built]
 [125] multi ./public/javascripts/homepage.js 28 bytes {1} [built]
 [126] multi ./public/javascripts/render_viewer.js 28 bytes {0} [built]
    + 112 hidden modules

> panX-visualization@2.0.0 build:gulp /home/fiestaj/pan-genome-visualization
> NODE_ENV=production gulp

[16:26:19] Using gulpfile ~/pan-genome-visualization/gulpfile.js
[16:26:19] Starting 'miniCSS'...
[16:26:19] Finished 'miniCSS' after 17 ms
[16:26:19] Starting 'default'...
[16:26:19] Finished 'default' after 32 μs

> panX-visualization@2.0.0 build:html /home/fiestaj/pan-genome-visualization
> rm -rf public/*.html && node ./scripts/renderPageHtml.js && node ./scripts/renderPathogenHtml.js

{ FetchError: request to http://localhost:8001/index.json failed, reason: connect ECONNREFUSED 127.0.0.1:8001
    at ClientRequest.<anonymous> (/home/fiestaj/pan-genome-visualization/node_modules/node-fetch/lib/index.js:1491:11)
    at ClientRequest.emit (events.js:182:13)
    at Socket.socketErrorListener (_http_client.js:391:9)
    at Socket.emit (events.js:182:13)
    at emitErrorNT (internal/streams/destroy.js:82:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:50:3)
    at process._tickCallback (internal/process/next_tick.js:63:19)
  message:
   'request to http://localhost:8001/index.json failed, reason: connect ECONNREFUSED 127.0.0.1:8001',
  type: 'system',
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED' }

ivan-aksamentov commented 1 year ago

Hi @fetyj,

I noticed that the preparation of custom datasets is not explained well in the docs. I just pushed an update to the README.md in attempt to improve this.

Could you please go through the steps in the section "Running locally with the default data" and then steps in the section "Running locally with your own data", to see if it works for you?

Unfortunately, the app is very old and I sometimes have problems running it myself. But let's find out how to do it together.

fetyj commented 1 year ago

Hi @ivan-aksamentov , Thanks for your help. So I run over all the steps but now I'm still get this error:

npm run build

> panX-visualization@2.0.0 build /home/fiestaj/pan-genome-visualization
> NODE_ENV=production npm run build:webpack && NODE_ENV=production npm run build:gulp && NODE_ENV=production npm run build:html

> panX-visualization@2.0.0 build:webpack /home/fiestaj/pan-genome-visualization
> NODE_ENV=production webpack --env production

Hash: c3047871bb267aaa73db
Version: webpack 2.2.1
Time: 3599ms
           Asset     Size  Chunks                    Chunk Names
client_bundle.js  3.39 MB       0  [emitted]  [big]  client_bundle
     homepage.js  1.08 MB       1  [emitted]  [big]  homepage
   [1] ./~/d3/d3.js 337 kB {0} {1} [built]
  [12] ./~/jquery/dist/jquery.js 268 kB {0} {1} [built]
  [15] ./public/javascripts/tooltips.js 8.33 kB {0} [built]
  [21] ./~/datatables.net/js/jquery.dataTables.js 449 kB {0} {1} [built]
  [23] ./public/phyloTree/src/updateTree.js 7.29 kB {0} [built]
 [192] ./public/javascripts/geneTreeCallbacks.js 3.22 kB {0} [built]
 [193] ./public/javascripts/meta-color-assignment.js 8.47 kB {0} [built]
 [194] ./public/javascripts/meta-color-legend.js 11.6 kB {0} [built]
 [195] ./public/javascripts/speciesTreeCallbacks.js 3.25 kB {0} [built]
 [202] ./public/javascripts/homepage.js 3.19 kB {1} [built]
 [203] ./public/javascripts/render_viewer.js 9.52 kB {0} [built]
 [524] ./public/javascripts/chartsAndClusterTable.js 18.5 kB {0} [built]
 [528] ./public/javascripts/speciesTree.js 2.54 kB {0} [built]
 [531] multi ./public/javascripts/homepage.js 28 bytes {1} [built]
 [532] multi ./public/javascripts/render_viewer.js 28 bytes {0} [built]
    + 518 hidden modules

> panX-visualization@2.0.0 build:gulp /home/fiestaj/pan-genome-visualization
> NODE_ENV=production gulp

[16:16:06] Using gulpfile ~/pan-genome-visualization/gulpfile.js
[16:16:06] Starting 'miniCSS'...
[16:16:06] Finished 'miniCSS' after 18 ms
[16:16:06] Starting 'default'...
[16:16:06] Finished 'default' after 33 μs

> panX-visualization@2.0.0 build:html /home/fiestaj/pan-genome-visualization
> rm -rf public/*.html && node ./scripts/renderPageHtml.js && node ./scripts/renderPathogenHtml.js

SyntaxError: Unexpected token < in JSON at position 0
    at JSON.parse (<anonymous>)
    at main (/home/fiestaj/pan-genome-visualization/scripts/renderPathogenHtml.js:58:26)
    at process._tickCallback (internal/process/next_tick.js:68:7)

After the serve, my data is accessible via http//localhost:8001/data/speciesname/vis/coreGenomeTree.json and not like specified in the readme http//localhost:8001/data/speciesname/coreGenomeTree.json I don't know if this related to my issue

ivan-aksamentov commented 1 year ago

@fetyj The URL

http//localhost:8001/data/speciesname/vis/coreGenomeTree.json

seems to be containing the vis segment in it. So you probably have a directory vis/ inside the directory speciesname/. It should not be there. Try to put all the files from vis/ into speciesname/ directly and delete the vis/. Compare your directory tree to the directory tree described in the README (e.g. use tree command on Linux). In the example provided in the README, the path to coreGenomeTree.json looks like that:

pangenome-data/dataset/Escherichia_coli/coreGenomeTree.json

Also note that, the correct URL should be

http://localhost:8001/dataset/Escherichia_coli/coreGenomeTree.json

i.e. with dataset/ and not data/ like in your example.

The directory tree should be exactly like described in the README. Only pathogen names can vary. All expected path segments should be there, and they should be named exactly. There should not be any extra path segments. If you don't follow the path conventions, the build scripts will simply not find your data.

Regarding the error:

SyntaxError: Unexpected token < in JSON at position 0

It seems that the index.json have not been generated correctly. This is probably due to the vis/ subdirectory problem. Try to remove the vis/ directory as described above, then re-index the data and re-build the app, as you did before.

Let me know how it goes!

In case of problems, please provide the output of your tree command as well as the contents of the generated index.json file.

And if you have any ideas on how to improve the process or how to document it better for other people, please don't hesitate to let us know. Contributions are very welcome!

fetyj commented 1 year ago

Hi @ivan-aksamentov, I think there is some tricky issue on data generated by the ./panX.py analysis. As you see, after running the analysis, I have those contents

.
└── dataset
    └── S_th_bpx
        ├── allclusters_final.tsv
        ├── geneCluster
        ├── geneID_to_description.cpk
        ├── geneID_to_geneSeqID.cpk
        ├── input_GenBank
        ├── log
        ├── metainfo.tsv
        ├── nucleotide_fna
        ├── protein_faa
        ├── RNA_fna
        ├── strain_list.cpk
        ├── tmp_core
        └── vis

10 directories, 5 files

So I put all the files from vis\ to S_th_bpx\ but vis\ folder contains already a geneCluster so I merge the files.

.
└── dataset
    └── S_th_bpx
        ├── allclusters_final.tsv
        ├── coreGenomeTree.json
        ├── geneCluster
        ├── geneCluster.json
        ├── geneID_to_description.cpk
        ├── geneID_to_geneSeqID.cpk
        ├── input_GenBank
        ├── log
        ├── metaConfiguration.js
        ├── metainfo.tsv
        ├── nucleotide_fna
        ├── protein_faa
        ├── RNA_fna
        ├── strain_list.cpk
        ├── strainMetainfo.json
        ├── strain_tree.nwk
        └── tmp_core

9 directories, 10 files

I noticed that I don't have neither all_gene_alignments.zip , core_gene_alignments.zip files in my S_th_bpx\ directory and the geneCluster contents seems different, I have no files with _refined_ in the name. Nevertheless, I can get all working but as expected I can't get the alignment files of a particular gene if I select them on the web app.

ivan-aksamentov commented 1 year ago

@fetyj You should not merge geneCluster directory. Keep it in S_th_bpx/geneCluster.

plaquette commented 1 year ago

as @ivan-aksamentov has mentioned the geneCluster directory should be kept in its original state and not be merged.

regarding the alignments:

the all_gene_alignments.zip and core_gene_alignments.zip archives are computed by us in a different step to be distributed via pangenome.org and are (as far as i know) not required for the visualization to work locally with your own data.

the _refined_ -string in the alignment file-names should also not change anything in respect to the download of individual files. to debug this, copy the download link and check if it points to the actual alignment file. my assumption would be, that since you merged the folders it points to somewhere like .../S_th_bpx/geneCluster/aln.fa and since its not there anymore, the download would not work.

fetyj commented 1 year ago

What should I do with the geneCluster folder which is in vis\ ? should I get rid of them or rename it?

plaquette commented 1 year ago

ah sorry maybe i misunderstood you - you take only the vis folder and its contents and move it to wherever you host the visualization. everything needed should be in there.

fetyj commented 1 year ago

@ivan-aksamentov , here is the content of vis\ which files I have to put directly in my species_name\

── vis
    ├── coreGenomeTree.json
    ├── geneCluster
    ├── geneCluster.json
    ├── metaConfiguration.js
    ├── strainMetainfo.json
    └── strain_tree.nwk

as you see, there is a folder also called geneCluster so I don't know how to deal with it when copying to species_name

@fetyj You should not merge geneCluster directory. Keep it in S_th_bpx/geneCluster.

It seems to contain all my alignment files...you can have a look at the content of those files here geneCluster_vis_folder.txt geneCluster_species_name_folder.txt

plaquette commented 1 year ago

this looks good - can you you copy paste a download link for an alignment from your local web-view and compare it to your actual file-path?

you should move everything from inside of the vis folder to dataset/S_th_bpx/

so try renaming the vis folder to S_th_bpx and move into dataset.

fetyj commented 1 year ago

@plaquette thanks for the tip, renaming vis\ folder to species_name\ does the job. If you don't mind how can I get my own core_gene_alignments ?

plaquette commented 1 year ago

we have a non-polished (non public) pipeline for that purpose.

what it does is basically:

takes the geneCluster.json as an input and iterates over all the genes
for each gene it decides if its a core gene (based on the amount of strains it contains, and if the duplication fields value is "no")
depending on the previous step each alignment is added to its respective archive

...maybe we'll add this to the main pipeline at some point.

since it works now - can we close the issue?

fetyj commented 1 year ago

Got the principle but not sure to reproduce that with my actual skills in bioinfo... Anyway, thanks for your help :)

Fety

neherlab / pan-genome-visualization

pan-genome results vizualisation #17