sfingram / apq-qsne

A collection of code for Dimensionality Reduction for Documents with Nearest Neighbor Queries
Other
9 stars 1 forks source link

Example input files from vectors to t-SNE #1

Open avilella opened 4 months ago

avilella commented 4 months ago

Hi all,

I've been searching github for ways to classify documents (E.g. Word or Excel documents in a corpus) and found the paper on Q-SNE interesting and the code self-contained and easy to deploy.

Would it be possible to have some example input files in this repo and a brief description of the steps to go from them to a t-SNE coordinates output file?

The examples in the paper either were pre-processed (TF-IDF) from where they were downloaded, or I couldn't find them to download myself.

I am assuming a combination of csv2dmat and tsne should be enough, not sure if a similar approach would be wrapped all in one step in the testapq binary.

My plan is to do:

./csv2dmat -i e.test.vec -d e.dmat
./tsne -i e.dmat

But I am unsure what the .vec file format should be.

Thanks in advance.

sfingram commented 4 months ago

Hi @avilella , I unfortunately don't have anymore time to devote this project (an academic project from several years back), but the examples are located at this project page

You will have to copy the links and replace the http with https ... (I can no longer edit that page to correct it)

Edit (here are the corrected links for your convenience):

warlogs TFIDF term-vectors. → (BZip2 archive 1MB) metacombine TFIDF term-vectors. → (BZip2 archive 871KB) textfiles TFIDF term-vectors. → (BZip2 archive 149MB) cables TFIDF term-vectors. → (BZip2 archive 318MB) nytimes TFIDF term-vectors. → (BZip2 archive 524MB) pubmed TFIDF term-vectors. → (BZip2 archive 4GB)

avilella commented 4 months ago

Thanks, downloading now.

On Thu, May 2, 2024 at 1:48 PM Stephen Ingram @.***> wrote:

Hi @avilella https://github.com/avilella , I unfortunately don't have anymore time to devote this project (an academic project from several years back), but the examples are located at this project page : https://www.cs.ubc.ca/labs/imager/tr/2014/QSNE/

You will have to copy the links and replace the http with https ... (I can no longer edit that page to correct it)

— Reply to this email directly, view it on GitHub https://github.com/sfingram/apq-qsne/issues/1#issuecomment-2090418023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSN5IWIH4AK574CJW4DTZAIYYNAVCNFSM6AAAAABGWWXJOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJQGQYTQMBSGM . You are receiving this because you were mentioned.Message ID: @.***>

avilella commented 4 months ago

Would you recommend a similar github repo for plotting t-SNE plots in a similar way to apq-qsne? Thanks in advance.

On Thu, May 2, 2024 at 1:51 PM Albert Vilella @.***> wrote:

Thanks, downloading now.

On Thu, May 2, 2024 at 1:48 PM Stephen Ingram @.***> wrote:

Hi @avilella https://github.com/avilella , I unfortunately don't have anymore time to devote this project (an academic project from several years back), but the examples are located at this project page : https://www.cs.ubc.ca/labs/imager/tr/2014/QSNE/

You will have to copy the links and replace the http with https ... (I can no longer edit that page to correct it)

— Reply to this email directly, view it on GitHub https://github.com/sfingram/apq-qsne/issues/1#issuecomment-2090418023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSN5IWIH4AK574CJW4DTZAIYYNAVCNFSM6AAAAABGWWXJOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJQGQYTQMBSGM . You are receiving this because you were mentioned.Message ID: @.***>