nasa-jpl-memex / memex-explorer

Viewers for statistics and dashboarding of Domain Search Engine data
BSD 2-Clause "Simplified" License
121 stars 69 forks source link

Ache Issues on EC2 and Vagrant #575

Closed brittainhard closed 9 years ago

brittainhard commented 9 years ago

This is a rather complicated issue. Basically, I realized that Ache crawls did not work on Vagrant or ec2. I fixed this issue in #573 .

What I didn't figure out is why Ache was failing. The basic fix was to change MEDIA_ROOT from /home/vagrant/resources back to the original memex-explorer/source/resources. Basically, ache was unable to properly read the model files when they were stored at /home/vagrant/resources.

This fixed the problem on deployment but it didn't fix the problem on vagrant. I'd love to talk with you about this on Monday @ahmadia and @amfarrell so we can figure out what caused this to happen. I'll post the content of the error here.

brittainhard commented 9 years ago
[2015-06-11 20:02:24,123] WARN [main] (Main.java:150) - Data output path already exists, deleting everything
[2015-06-11 20:02:24,131] INFO [main] (ParameterFile.java:125) - CONFIGURATION FILE = /home/vagrant/resources/crawls/sampleache123/config/link_storage/link_storage.cfg
[2015-06-11 20:02:24,858] INFO [main] (AddSeeds.java:50) - Number of seeds:3444
[2015-06-11 20:02:24,920] INFO [main] (ParameterFile.java:125) - CONFIGURATION FILE = /home/vagrant/resources/crawls/sampleache123/config/link_storage/link_storage.cfg
LINK_CLASSIFIER:class focusedCrawler.link.classifier.LinkClassifierBaseline
[2015-06-11 20:02:24,928] INFO [main] (LinkStorage.java:252) - USE_SCOPE:false
[2015-06-11 20:02:25,033] INFO [main] (LinkStorage.java:258) - FRONTIER: class focusedCrawler.link.frontier.FrontierTargetRepositoryBaseline
>> TOTAL LOADED: 3444
[2015-06-11 20:02:25,159] INFO [main] (LinkStorage.java:359) - >> LOADING GRAPH...
[2015-06-11 20:02:25,231] INFO [main] (LinkStorage.java:366) - >> DONE GRAPH.
[2015-06-11 20:02:25,233] INFO [main] (ParameterFile.java:125) - CONFIGURATION FILE = /home/vagrant/resources/crawls/sampleache123/config/target_storage/target_storage.cfg
[2015-06-11 20:02:25,236] INFO [main] (ParameterFile.java:125) - CONFIGURATION FILE = /home/vagrant/resources/models/1/pageclassifier.features
[2015-06-11 20:02:27,113]ERROR [main] (Main.java:142) - Problem while starting crawler.
java.lang.IllegalArgumentException: Attribute names are not unique! Causes: '??????????????' '????????????????????' '?????????' '??????????????????' '?????????' '????????????????????' '??????????????' '????????????' '??????????' '???????????????' '??????????????' '????????????' '?????????' '??????????' '????????' '??????????????' '??????????????' '?????????' '????????????' '????????????' '????????' '??????????' '????????' '?????????' '????????????????????' '??????????????' '????????????' '??????????????????' '????????' '????????' '????????' '??????????' '????????' '???????????????' '????????????' '?????????????????????' '????????' '????????' '??????????' '????????' '????????????' '????????' '??????????' '??????' '??????' '??????' '????????????' '????????????????' '????????' '????????' '??????????' '??????????????' '????????????' '????????????' '????????' '????????' '??????????' '??????????' '??????????' '??????????' '??????????' '????????????' '??????????' '??????????' '??????????' '??????????????' '??????????' '??????????' '????????' '????????????' '??????????????' '??????????????' '??????????' '????????' '????????' '????????????' '????????????' '??????????' '????????' '????????????????????' '??????????????' '??????????' '????????????' '????????????' '??????????' '????????????' '??????????????' '????????' '????????????????' '??????????' '??????????' '??????????' '??????????????' '??????' '??????????????' '??????' '??????' '??????????' '??????' '??????' '????????' '??????????' '??????????' '??????' '??????????' '????????????' '??????????' '????????????????' '??????????' '??????????' '??????????' '??????????????' '????????????' '??????????' '????????' '????????????' '??????????????' '??????????????' '????????' '??????????' '????????' '??????????????' '????????????' '????????' '??????????' '??????????' '????????' '??????????????' '????????????????????' '????????????' '??????????????????' '????????' '??????' '??????????' '??????' '????????????' '????????????' '????????????????' '????????????????' '????????' '??????????' '????????' '?????????????????????' '???????????????????????????' '????????????????' '??????' '??????' '????????????????' '????????????' '??????????????' '????????' '??????????' '??????' '??????' '?????????????????????' '????????????' '????????????' '??????????????????' '??????????????????' '?????????' '????????????' '?????????????????????' '????????????' '?????????' '???????????????' '????????????' '??????????????????' '??????????????????' '?????????????????????' '????????????' '?????????' '????????????' '?????????' '????????????' '?????????????????????' '??????????????????' '??????????????????' '??????????' '??????????' '???????????????' '????????????' '???????????????????????????' '???????????????' '????????????' '????????????????????????' '?????????' '????????????' '??????????????????????????????' '???????????????' '??????' '?????????' '??????' '????????????' '????????????' '????????' '????????????' '????????????' '?????????' '??????????????????' '??????????????' '??????????????????' '????????' '???????????????????????????' '??????????????????' '?????????????????????????????????' '?????????????????????' '????????????' '??????????????????' '?????????????????????' '????????????????' '??????????????????' '?????????' '??????????????' '?????????????????????' '??????????????????' '????????????' '?????????' '???????????????' '????????' '??????????' '????????????????????????' '????????' '????????' '????????' '????????????' '??????????' '??????' '??????????' '????????' '?????????' '??????????' '????????????????????' '????????' '??????' '????????' '????????' '????????????????' '????????' '??????????' '????????????' '??????' '??????' '??????' '??????' '??????' '??????' '????????????' '????????????' '????????????' '????????' '????????' '????????' '????????' '????????????????' '??????????' '????????' '????????' '??????' '????????????' '??????' '??????????' '??????' '??????' '??????' '??????' '??????' '??????????' '??????????????????????' '??????????' '????????????????' '????????' '????????' '??????????' '??????????' '????????????' '????????' '????????' '?????????????????????' '??????????????????' '??????????????????????' '????????????' '??????????' '????????' '????????' '????????' '??????????????' '??????????' '??????????' '????????' '??????????????????' '????????????' '??????????????' '????????' '????????' '??????' '????????????????' '??????????????' '????????????????' '??????????' '??????????' '??????????' '????????' '??????????' '????????' '?????????' '??????????' '????????????' '??????????' '??????????' '????????' '??????' '??????????????' '??????' '??????' '??????' '??????' '??????' '??????' '??????' '??????' 
    at weka.core.Instances.<init>(Instances.java:259) ~[weka-stable-3.6.10.jar:na]
    at focusedCrawler.target.TargetStorage.createClassifier(TargetStorage.java:342) ~[ache-0.1.0.jar:na]
    at focusedCrawler.target.TargetStorage.createTargetStorage(TargetStorage.java:277) ~[ache-0.1.0.jar:na]
    at focusedCrawler.Main.startCrawl(Main.java:126) [ache-0.1.0.jar:na]
    at focusedCrawler.Main.main(Main.java:30) [ache-0.1.0.jar:na]
brittainhard commented 9 years ago

My gut feeling is that it has something to do with permissions.

ahmadia commented 9 years ago

From @brittainhard on Flowdock, the solution is to restart celery.

ahmadia commented 9 years ago

@amfarrell - please test on production / staging and report back in this issue.

amfarrell commented 9 years ago

This seems to only be solves when celery is running not as a background process. On production, I have started celery running in a screen session. This isn't really sustainable, so this bug still stands.

ahmadia commented 9 years ago

We've got a temporary fix in production where the celery process is being executed as a user process. I suspect there's a weird file system interaction when celery is run as root (by salt). Too late in the game to investigate right now.

ahmadia commented 9 years ago

We're using a conda-installed celery now by default, so this is no longer an issue.