taupilu / gbif-providertoolkit

Automatically exported from code.google.com/p/gbif-providertoolkit
0 stars 1 forks source link

Upload tab delimited source file to IPT 2.0 RC1-SNAPSHOT #393

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Add tab delimited source data (identical same file as working fine for as 
for the IPT version 1.0 RC4)
2.
3.

What version of the provider software are you using? The version should be
displayed in the footer of any page.
  2.0-RC1-SNAPSHOT-r2371

What browser are you using?
  Firefox 3.6.10

Please provide any additional information below.

Testing to add the very same tab delimited text file as used for the IPT 1.0 
RC4 to the new IPT 2.0 RC1-SNAPSHOT causes an error message

"Expression source.rdbms is undefined on line 65, column 64 in 
WEB-INF/pages/manage/source.ftl"

Page URL with error message:
http://njord.nordgen.org:8080/ipt_v2_rc1/manage/source.do?r=sesto&id=sesto_2_dwc
_100

This is an example of the tab-delimited file I am using:
http://wwwdev.ngb.se/portal/scope/sesto/files/sesto_2_dwc_100.txt

Same error with Linux, CentOS and on Mac OSX 10.6

Uploading the same data in a DarwinCore Archive works fine!

Original issue reported on code.google.com by dag.endresen on 13 Oct 2010 at 3:57

Attachments:

GoogleCodeExporter commented 9 years ago
I opened the tab delimited file with MS Excel and saved again as CSV. Now 
uploading and mapping with IPT 2.0 RC1-SNAPSHOT works fine...

Original comment by dag.endresen on 13 Oct 2010 at 4:32

GoogleCodeExporter commented 9 years ago

Original comment by timrobertson100 on 13 Oct 2010 at 5:46

GoogleCodeExporter commented 9 years ago
Dag, I cannot reproduce this error with the attached tab file on my local IPT. 
Is there anything in the ipt or tomcat/catalina log files that indicates what 
goes wrong?

The error in the screenshot is a secondary problem. The tab upload apparently 
goes wrong and the IPT thinks it has not uploaded a file but created a sql 
source instead. Therefore you get some freemarker error. So Im curious what the 
reason for the failed upload is

Original comment by wixner@gmail.com on 14 Oct 2010 at 5:28

GoogleCodeExporter commented 9 years ago
r2381 fixes the secondary freemarker issue, but the real problem still needs to 
be nailed down

Original comment by wixner@gmail.com on 14 Oct 2010 at 5:34

GoogleCodeExporter commented 9 years ago
When trying to map to the tab delimited file, all column headers come listed 
wide across the page, as if the tab delimiter is not understood by the IPT. See 
screen-shot...

I have tested with Linux style line breaks (0010 0x00A, LF) and 0009 0x009 
style TABs, UTF-8.

The error message above is shown when selecting "Analyze" for the uploaded tab 
delimited data file. Sometimes adding the data file I see the error message 
first when revisiting the source data object in the IPT interface.

Original comment by dag.endresen on 14 Oct 2010 at 7:01

Attachments:

GoogleCodeExporter commented 9 years ago
Same on both the CentOS Linux installtion and the MacBook installation - and 
the file works fine with IPT 1 RC4.

Original comment by dag.endresen on 14 Oct 2010 at 7:02

GoogleCodeExporter commented 9 years ago
Hm, I dont have any problem with the sesto_2_dwc_100.txt file. It works fine, 
analyze preview, mapping and dwca generation.

So adding the tab file as a new source does not result in an error, only when 
hitting analyze? Is there anything in the ipt log files that gives a hint where 
things go wrong? 

Original comment by wixner@gmail.com on 18 Oct 2010 at 9:09

GoogleCodeExporter commented 9 years ago
I also see this working with a mac 10.5 Java 6 and Tomcat 5.5.30

Dag, would you please confirm you see this with the latest on 
http://code.google.com/p/gbif-providertoolkit/downloads/detail?name=ipt-2.0-SNAP
SHOT-r2412.war

Original comment by timrobertson100 on 18 Oct 2010 at 3:28

GoogleCodeExporter commented 9 years ago
On the njord server we have deployed the new release as a separate application 
at the Tomcat server. The new application (named ipt2r2412) was directed to the 
same data directory folder. However the new IPT version seems to be locked to 
the base URL reported for the installation of the previous version (r2371). 
Perhaps the installation of a new IPT version should proceed from setup.do to 
setup2.do (Setup Part II) to allow for the edit of the base URL?

Testing the new r2412 on my MacBook, I find the similar problem as reported 
above. When loading the tab delimited file I left the "Field Delimiter" blank. 
I have also tested to enter "\t" and to click the option 1 "Tabulator" from the 
"i" button.

When I proceed to map the new dataset (from the tab delimited example file from 
SESTO), I find that all the column header names are listed sideways as one 
single option in the select list (and not down as selectable entries as when 
converting the tab delimited dataset file to a comma delimited file)

Original comment by dag.endresen on 18 Oct 2010 at 5:08

Attachments:

GoogleCodeExporter commented 9 years ago
"Perhaps the installation of a new IPT version should proceed from setup.do to 
setup2.do (Setup Part II) to allow for the edit of the base URL?".  Please use 
http://code.google.com/p/gbif-providertoolkit/issues/detail?id=402 for 
commentary on that issue.

Relating to this issue:  Could you please create an Admin account for me on the 
latest version in njord so I can  try and observe any differences to my setup?  

Original comment by timrobertson100 on 18 Oct 2010 at 6:27

GoogleCodeExporter commented 9 years ago
Could you please try the following exact procedure:

a) create new resource
b) fill basic metadata
c) choose source file, select "add" button
d) click analyze
e) click preview

At this stage I see the correct data as a preview - if not please say so

f) select save - DO NOT ALTER ANYTHING ELSE ON THE PAGE HERE
g) go to mapping - seems fine

however, if I change anything on the page (e.g. add the \t field delimiter) I 
see the issue you report and is a bug.

Could you please either create an account for me as comment 10) or confirm that 
this accurately reproduces the issue, and the workaround I propose is correct - 
the issue will be addressed though, but I'd like to make sure we have an 
accurate, reproducible test procedure.

Thanks for all your feedback Dag.

Original comment by timrobertson100 on 18 Oct 2010 at 6:37

GoogleCodeExporter commented 9 years ago
Hi Markus,

Steps a) to g) followed

After upload I see source data indicated as readable and with 54 columns and 
101 rows. After Analyze I see source data indicated as readable but with only 1 
columns. After preview I see the dataset seemingly in columns, but not in 
appropriate columns and different from how it looks like when I upload a CSV 
version of the file...

Not pressing anything different from following exactly your description, I 
still see all the column headers as only one option in the select list under 
mapping.

(PS. I select occurrence resource under options for metadata)

You are most welcome to test with your own account! I will send you the login 
by email.

Original comment by dag.endresen on 18 Oct 2010 at 7:11

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Dag.  Urff, I thought I had nailed it.  Well, I at least identified some 
issues relating to this.  Please do mail an account to me, but note it is Tim, 
not Markus.

Original comment by timrobertson100 on 18 Oct 2010 at 7:15

GoogleCodeExporter commented 9 years ago
Using the account on the njord server I was able to confirm comment 11 is 
accurate:

a) create new resource
b) fill basic metadata
c) choose source file, select "add" button
d) click analyze
e) click preview

The attached file shows it previews correctly, and clicking save and going to 
the mapping allows me to correctly map the data.

http://code.google.com/p/gbif-providertoolkit/issues/detail?id=401 is coming 
into play and it does not auto map, but I can manually do it.

However, to reproduce the issue:

a) create new resource
b) fill basic metadata
c) choose source file, select "add" button
d) insert \t in the "Field Delimiter" 
e) click analyze

There is now no way to get this back to working.  If one deletes the Field 
Delimiter and hits analyze, then an unrecoverable "FreeMarker template error!" 
is shown - you have to delete the whole resource.

Original comment by timrobertson100 on 18 Oct 2010 at 7:38

Attachments:

GoogleCodeExporter commented 9 years ago
I see. Its because the backslash escapes are treated literally!
So when entering \t it really looks for 2 chars, "\" followed by "t" as the 
delimiter.
And that fails... I have added a check for backslash escaped \t=tab \n=newline 
\r=windowy thing \s=space. If any those occur they get replaced.

Dag, can you try to use the tab character directly?
You can go into the (i) info box and select Tabulator

Original comment by wixner@gmail.com on 22 Oct 2010 at 3:28

GoogleCodeExporter commented 9 years ago
Using Version 2.0-SNAPSHOT-r2431 on my MacBook I find that when I add the same 
example tab delimited file as tested above - this dataset file loads fine when 
I simply add the file and press Preview - BUT after I press Analyze the preview 
show data not delimited in appropriate cells.

Same result either I leave default field delimiter or I select "Tabulator" from 
the "i" button to the left. After I press Analyze the preview does not look ok.

Original comment by dag.endresen on 22 Oct 2010 at 3:53

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
If I input "\\t" (without quotes) as delimiter then the Analyze button gives a 
reasonable response for line count etc, but now the preview shows no cells...?

Original comment by dag.endresen on 22 Oct 2010 at 4:04

GoogleCodeExporter commented 9 years ago
Hope r2451 has fixed this issue. I am going to provide a new war in a few 
minutes for testing, Dag

Original comment by wixner@gmail.com on 22 Oct 2010 at 4:07

GoogleCodeExporter commented 9 years ago
If I input "0x009" (without quotes) the analyze makes sense, and the preview at 
least looks somewhat better... yet not perfect split in appropriate cells

Original comment by dag.endresen on 22 Oct 2010 at 4:08

GoogleCodeExporter commented 9 years ago
With Version 2.0-SNAPSHOT-r2450 "\t" for tabulator works perfect :-)

Original comment by dag.endresen on 22 Oct 2010 at 4:20

GoogleCodeExporter commented 9 years ago
Also the loading of the full dataset from SESTO, tab delimited - including 
auto-mapping to the DwC Occurrence terms and the Germplasm extension terms 
works fine.

One minor thing is that the number of records listed as 0 (zero) at the IPT 
home, or from menu option manage (when logged in). Selecting public and publish 
does not bring the real number of records up here. This perhaps another issue 
reported elsewhere...?

Original comment by dag.endresen on 22 Oct 2010 at 4:33

Attachments:

GoogleCodeExporter commented 9 years ago
glad to hear that. Definately another issue. I remember having fixed something 
like that before, but lets see. 

Mind reporting it as another one? Im keen on closing this one ;)

Original comment by wixner@gmail.com on 22 Oct 2010 at 4:43