uwdata / termite-data-server

Data Server for Topic Models
BSD 3-Clause "New" or "Revised" License
121 stars 46 forks source link

IOError: [Errno 2] No such file or directory: 'data/demo/infovis/corpus/corpus.txt' #32

Open dbl001 opened 7 years ago

dbl001 commented 7 years ago

I'm trying to install the termite-data-server on OSX 10.11.6.

./demo.py infovis
--------------------------------------------------------------------------------
Build a topic model (mallet) using a demo dataset (infovis)
  database = data/demo/infovis/corpus
    corpus = data/demo/infovis/corpus
     model = data/demo/infovis/model-mallet
       app = infovis_mallet
--------------------------------------------------------------------------------
# Setting up the infovis dataset...
    Creating folder 'data/demo/infovis'...
    Downloading...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0   154    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (7) Failed to connect to localhost port 80: Connection refused
    Uncompressing...
[data/demo/infovis/download/infovis.zip]
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of data/demo/infovis/download/infovis.zip or
        data/demo/infovis/download/infovis.zip.zip, and cannot find data/demo/infovis/download/infovis.zip.ZIP, period.
    Extracting corpus.txt from corpus.db...
Exporting database [data/demo/infovis/corpus/corpus.db] to file [data/demo/infovis/corpus/corpus.txt]
Traceback (most recent call last):
  File "bin/export_corpus.py", line 25, in <module>
    main()
  File "bin/export_corpus.py", line 22, in main
    ExportCorpus(args.database, args.corpus)
  File "bin/export_corpus.py", line 14, in ExportCorpus
    with Corpus_DB(database_path) as corpusDB:
  File "/Users/davidlaxer/termite-data-server/bin/db/Corpus_DB.py", line 33, in __enter__
    self.DefineOptionsTable()
  File "/Users/davidlaxer/termite-data-server/bin/db/Corpus_DB.py", line 52, in DefineOptionsTable
    self.SetOption( key, value, overwrite = self.isInit )
  File "/Users/davidlaxer/termite-data-server/bin/db/Corpus_DB.py", line 57, in SetOption
    if self.db( where ).count() > 0:
  File "web2py/gluon/dal.py", line 10515, in count
    return db._adapter.count(self.query,distinct)
  File "web2py/gluon/dal.py", line 1902, in count
    self.execute(self._count(query, distinct))
  File "web2py/gluon/dal.py", line 1969, in execute
    return self.log_execute(*a, **b)
  File "web2py/gluon/dal.py", line 1963, in log_execute
    ret = self.cursor.execute(command, *a[1:], **b)
sqlite3.OperationalError: no such table: options
    Corpus available: data/demo/infovis/corpus
# Setting up MALLET (MAchine Learning for LanguagE Toolkit)...
    Creating folder 'externals/mallet-2.0.7'...
    Downloading...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.8M  100 11.8M    0     0  1624k      0  0:00:07  0:00:07 --:--:-- 2011k
    Extracting license...
    Creating folder 'tools/mallet-2.0.7'...
    Uncompressing...
    Available: tools/mallet-2.0.7
    Available: tools/mallet-2.0.7
# Setting up Stanford CoreNLP tools...
    Creating folder 'externals/corenlp-3.3.1'...
    Downloading...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0   336    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  214M  100  214M    0     0  1211k      0  0:03:01  0:03:01 --:--:-- 1429k
    Extracting license...
    Creating folder 'tools/corenlp-3.3.1'...
    Uncompressing...
    Available: tools/corenlp-3.3.1
--------------------------------------------------------------------------------
Training an LDA topic model using MALLET...
       corpus = data/demo/infovis/corpus/corpus.txt
        model = data/demo/infovis/model-mallet
  token_regex = \w{3,}
       topics = 20
        iters = 1000
--------------------------------------------------------------------------------
Importing a file into MALLET: [data/demo/infovis/corpus/corpus.txt] --> [data/demo/infovis/model-mallet/corpus.mallet]
Exception in thread "main" java.io.FileNotFoundException: data/demo/infovis/corpus/corpus.txt (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:131)
    at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:260)
Training an LDA model in MALLET: [data/demo/infovis/model-mallet/corpus.mallet] --> [data/demo/infovis/model-mallet/lda.mallet]
java.io.FileNotFoundException: data/demo/infovis/model-mallet/corpus.mallet (No such file or directory)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:131)
    at cc.mallet.types.InstanceList.load(InstanceList.java:787)
    at cc.mallet.topics.tui.Vectors2Topics.main(Vectors2Topics.java:378)
Exception in thread "main" java.lang.IllegalArgumentException: Couldn't read InstanceList from file data/demo/infovis/model-mallet/corpus.mallet
    at cc.mallet.types.InstanceList.load(InstanceList.java:794)
    at cc.mallet.topics.tui.Vectors2Topics.main(Vectors2Topics.java:378)
--------------------------------------------------------------------------------
Import a MALLET LDA topic model as a web2py application...
           app_name = infovis_mallet
           app_path = apps/infovis_mallet
         model_path = data/demo/infovis/model-mallet
    corpus_filename = data/demo/infovis/corpus/corpus.txt
  database_filename = data/demo/infovis/corpus/corpus.db
--------------------------------------------------------------------------------
Creating app: infovis_mallet [apps/temp_20170725_221922_651726_0528]
Creating folder: [apps/temp_20170725_221922_651726_0528/data]
Creating folder: [apps/temp_20170725_221922_651726_0528/databases]
Linking folder: [apps/temp_20170725_221922_651726_0528/models]
Linking folder: [apps/temp_20170725_221922_651726_0528/views]
Linking folder: [apps/temp_20170725_221922_651726_0528/controllers]
Linking folder: [apps/temp_20170725_221922_651726_0528/static]
Linking folder: [apps/temp_20170725_221922_651726_0528/modules]
Creating file: [apps/temp_20170725_221922_651726_0528/__init__.py]
David-Laxers-MacBook-Pro:termite-data-server davidlaxer$ ./demo.py infovis
--------------------------------------------------------------------------------
Build a topic model (mallet) using a demo dataset (infovis)
  database = data/demo/infovis/corpus
    corpus = data/demo/infovis/corpus
     model = data/demo/infovis/model-mallet
       app = infovis_mallet
--------------------------------------------------------------------------------
    Available: tools/corenlp-3.3.1
--------------------------------------------------------------------------------
Training an LDA topic model using MALLET...
       corpus = data/demo/infovis/corpus/corpus.txt
        model = data/demo/infovis/model-mallet
  token_regex = \w{3,}
       topics = 20
        iters = 1000
--------------------------------------------------------------------------------
    Already exists: data/demo/infovis/model-mallet
--------------------------------------------------------------------------------
Import a MALLET LDA topic model as a web2py application...
           app_name = infovis_mallet
           app_path = apps/infovis_mallet
         model_path = data/demo/infovis/model-mallet
    corpus_filename = data/demo/infovis/corpus/corpus.txt
  database_filename = data/demo/infovis/corpus/corpus.db
--------------------------------------------------------------------------------
Creating app: infovis_mallet [apps/temp_20170725_222309_156011_4992]
Creating folder: [apps/temp_20170725_222309_156011_4992/data]
Creating folder: [apps/temp_20170725_222309_156011_4992/databases]
Linking folder: [apps/temp_20170725_222309_156011_4992/models]
Linking folder: [apps/temp_20170725_222309_156011_4992/views]
Linking folder: [apps/temp_20170725_222309_156011_4992/controllers]
Linking folder: [apps/temp_20170725_222309_156011_4992/static]
Linking folder: [apps/temp_20170725_222309_156011_4992/modules]
Creating file: [apps/temp_20170725_222309_156011_4992/__init__.py]
Copying [data/demo/infovis/corpus/corpus.db] --> [apps/temp_20170725_222309_156011_4992/databases/corpus.db]
Copying [data/demo/infovis/corpus/corpus.txt] --> [apps/temp_20170725_222309_156011_4992/data/corpus.txt]
An error occured while creating app: infovis_mallet [apps/infovis_mallet]
Traceback (most recent call last):
  File "bin/read_mallet.py", line 85, in <module>
    main()
  File "bin/read_mallet.py", line 82, in main
    ImportMalletLDA( args.app_name, args.model_path, args.corpus_path, args.database_path, args.quiet, args.overwrite )
  File "bin/read_mallet.py", line 47, in ImportMalletLDA
    shutil.copy( corpus_filename, app_corpus_filename )
  File "/Users/davidlaxer/anaconda/lib/python2.7/shutil.py", line 119, in copy
    copyfile(src, dst)
  File "/Users/davidlaxer/anaconda/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: 'data/demo/infovis/corpus/corpus.txt'

Any ideas about the missing file: corpus.txt?

xinnyuann commented 5 years ago

Got the same error when trying the demo with 20newsgroups. I checked the data-fetch.sh file, guess it's because the download link doesn't exist anymore? Could anyone help point it to the right direction? Thanks so much