tdt / core

Transform any dataset into an HTTP API with The DataTank
http://thedatatank.com
83 stars 31 forks source link

Syntax error (typo) in DataController file for MongoDB #414

Closed dingsoyr closed 7 years ago

dingsoyr commented 8 years ago

In the file https://github.com/tdt/core/blob/master/app/Tdt/Core/DataControllers/MONGOController.php there is a typo.

In the first line "<?php namespace Tdt\Core\Datacontrollers;" the name of the file is wrong...it should be "DataControllers", not "Datacontrollers".

That is ... a missing capital letter "C"

This causes an error when viewing datasets residing in a MONGO db.

coreation commented 8 years ago

Good catch, thanks! Considering it's a typo, I've changed it directly on the master as a hotfix.

dingsoyr commented 8 years ago

Sweet...thanx

coreation commented 8 years ago

You're welcome, can I ask what you're using it for? Always nice to know the user base :).

dingsoyr commented 8 years ago

Was mainly doing a test of the application to see if it could be used here in Norway. We have some large datasets. To get tings "speedy" it is not an alternative to read the CSV file on the fly, so i wanted to test speed when using a Mongodb as backend for the dataset.

coreation commented 8 years ago

I see, the solution we use for one of our clients is that we ingest it in Elasticsearch automatically. We use the package tdt/input to configure a job that reads data from the CSV file, ingests it into an index of your choosing (pick a dedicated index for it though) automatically, just with configuration. Then when that's done, you can make an Elasticsearch datasource in order to read the data from Elasticsearch and have a default wildcard parameter to search in that data out of the box.

As far as MongoDB goes, I dont think we support exporting data into MongoDB directly, maybe there are other tools for that, or you can do a pull request to the tdt/input package ;)

dingsoyr commented 8 years ago

I see...the Elasticsearch seems to be a better choise then :-) Does that mean that the user can search Elasticsearch data through the "datatank" interface?

In regards to Mongo, i just did an commandline import of the CSV file directly into the database.

coreation commented 8 years ago

That is correct, the datatank interface will return json, but you're able to pass a "q" request parameter with a string which will be passed to the Elasticsearch controller as a query.

Be advised though, the standard queue'ing system is "sync", which means that the job is immediately executed and depending on your file can take while. This also means that adding several of those jobs after each other, one will have to wait until the first is done. For queueing, beanstalkd is also supported out of the box by laravel 4. (the framework this was built in)

Mongo-wise: neat! Whatever works :)

dingsoyr commented 8 years ago

Cool...i will se if i can test this with an Elastic database :-)