gswcm / ga-trees

Android application to represent Georgia Trees catalog
1 stars 1 forks source link

T5 into DB #1

Open simonbaev opened 10 years ago

simonbaev commented 10 years ago

Jeff, I believe it will be easier for you to get started with DB if you will deal with prepared data. Such data you can find in "Misc/T5". You are welcome to look through "Misc/dbfill.sh" and "Misc/commandLog" for some ideas on how data can be manipulated in Bash.

I created template for SQLite database with 4 empty tables. It is stored in Misc/ga-trees.db

YujiaWang commented 10 years ago

Thank you, Simon. Yes, I feel a little bit lost when dealing with bash commands and regular expressions. I will try to find some hints from files you uploaded, and see how it works.

Jeff.

On Sat, Apr 19, 2014 at 7:11 PM, Simon Baev notifications@github.comwrote:

Jeff, I believe it will be easier for you to get started with DB if you will deal with prepared data. Such data you can find in "Misc/T5". You are welcome to look through "Misc/dbfill.sh" and "Misc/commandLog" for some ideas on how data can be manipulated in Bash.

I created template for SQLite database with 4 empty tables. It is stored in Misc/ga-trees.db

— Reply to this email directly or view it on GitHubhttps://github.com/gswcm/ga-trees/issues/1 .

simonbaev commented 10 years ago

Jeff, I installed sqlite3 command-line tool in playground. You can use it interactively (run it without arguments) or from Bash script run it as "sqlite3 'some SQL code'"

YujiaWang commented 10 years ago

Dr. Baev, I utilized JDBC to set up connections with sqlite. I have already extracted description from T5 populate tree_desc table with those descriptions. As I interpret, leaves, fruit, bark, and tree columns inside tree_desc table will be used to store the paths of corresponding images. I just leave those columns blank at this point. In addition,I think we need the information to populate the tree_main, and tree_group table, which should be:

bName, and cName of each group, and their corresponding order(type) for populating tree_group table.

cName bName, and cName for populating tree_main table.

So could you please extract those information from Georgia_trees.pdf, such that I can use those information to populate those tables.

Jeff.

On Sun, Apr 20, 2014 at 3:05 PM, Simon Baev notifications@github.comwrote:

Jeff, I installed sqlite3 command-line tool in playground. You can use it interactively (run it without arguments) or from Bash script run it as "sqlite3 'some SQL code'"

— Reply to this email directly or view it on GitHubhttps://github.com/gswcm/ga-trees/issues/1#issuecomment-40902216 .

gswcm commented 10 years ago

Jeff,

Fields in the tree_desc table are reserved to store various parts of the description text that are bolded in PDF file. They are already stored in the text (take a look to PDF again, and they are "bold"-faced words in "description" paragraph for each tree.

I guess we don't need to store paths to images... the easiest way would be to identify directory where all images for a certain tree are stored (name of the directory is the same as tree common name) and the display them (images) all using some interface. In this case we don't need to store anything to refer to images!

Off of T5 you can extract pretty much everything you can see in tree_main table except aName (alternative name) and bName (botanical name). The cName field (common name) is there. I couldn't extract bName and aName from PDF because extraction engine did not recognize italic-shaped fonts, used to display this info. I gues we will need to populate tree_main with respect to all tree records without references to images (92) and then update tree_main table manually.

I have a script that fills in tree_main and tree_desc tables, but I would prefer to use yours for that purpose. Do you do it in Java? On Apr 20, 2014 4:55 PM, "YujiaWang" notifications@github.com wrote:

Dr. Baev, I utilized JDBC to set up connections with sqlite. I have already extracted description from T5 populate tree_desc table with those descriptions. As I interpret, leaves, fruit, bark, and tree columns inside tree_desc table will be used to store the paths of corresponding images. I just leave those columns blank at this point. In addition,I think we need the information to populate the tree_main, and tree_group table, which should be:

bName, and cName of each group, and their corresponding order(type) for populating tree_group table.

cName bName, and cName for populating tree_main table.

So could you please extract those information from Georgia_trees.pdf, such that I can use those information to populate those tables.

Jeff.

On Sun, Apr 20, 2014 at 3:05 PM, Simon Baev notifications@github.comwrote:

Jeff, I installed sqlite3 command-line tool in playground. You can use it interactively (run it without arguments) or from Bash script run it as "sqlite3 'some SQL code'"

— Reply to this email directly or view it on GitHub< https://github.com/gswcm/ga-trees/issues/1#issuecomment-40902216> .

— Reply to this email directly or view it on GitHubhttps://github.com/gswcm/ga-trees/issues/1#issuecomment-40904623 .

YujiaWang commented 10 years ago

Yes, I populated tree_desc and tree_main table along with information from T5.txt by using Java. I will manually populate the rest fields. Hopefully, it will be done today.

Jeff.

On Sun, Apr 20, 2014 at 6:11 PM, Simon Baev notifications@github.comwrote:

Jeff,

Fields in the tree_desc table are reserved to store various parts of the description text that are bolded in PDF file. They are already stored in the text (take a look to PDF again, and they are "bold"-faced words in "description" paragraph for each tree.

I guess we don't need to store paths to images... the easiest way would be to identify directory where all images for a certain tree are stored (name of the directory is the same as tree common name) and the display them (images) all using some interface. In this case we don't need to store anything to refer to images!

Off of T5 you can extract pretty much everything you can see in tree_main table except aName (alternative name) and bName (botanical name). The cName field (common name) is there. I couldn't extract bName and aName from PDF because extraction engine did not recognize italic-shaped fonts, used to display this info. I gues we will need to populate tree_main with respect to all tree records without references to images (92) and then update tree_main table manually.

I have a script that fills in tree_main and tree_desc tables, but I would prefer to use yours for that purpose. Do you do it in Java? On Apr 20, 2014 4:55 PM, "YujiaWang" notifications@github.com wrote:

Dr. Baev, I utilized JDBC to set up connections with sqlite. I have already extracted description from T5 populate tree_desc table with those descriptions. As I interpret, leaves, fruit, bark, and tree columns inside tree_desc table will be used to store the paths of corresponding images. I just leave those columns blank at this point. In addition,I think we need the information to populate the tree_main, and tree_group table, which should be:

bName, and cName of each group, and their corresponding order(type) for populating tree_group table.

cName bName, and cName for populating tree_main table.

So could you please extract those information from Georgia_trees.pdf, such that I can use those information to populate those tables.

Jeff.

On Sun, Apr 20, 2014 at 3:05 PM, Simon Baev notifications@github.comwrote:

Jeff, I installed sqlite3 command-line tool in playground. You can use it interactively (run it without arguments) or from Bash script run it as "sqlite3 'some SQL code'"

— Reply to this email directly or view it on GitHub< https://github.com/gswcm/ga-trees/issues/1#issuecomment-40902216> .

— Reply to this email directly or view it on GitHub< https://github.com/gswcm/ga-trees/issues/1#issuecomment-40904623> .

— Reply to this email directly or view it on GitHubhttps://github.com/gswcm/ga-trees/issues/1#issuecomment-40906262 .