Closed pbinkley closed 8 years ago
To reproduce got uuid:7a385f26-6387-4f4e-9ba6-152044731c04.xml migrated with v1.2.1
output was
ERROR [line: 34] With input '"Smith, John B. \"Web-Based Systems and Instruction.\" Web. < http://www.cs.unc.edu/Research/jbsAr': Invalid token "\"Smith," (found "\"Smith,"), production = :RDFLiteral
ERROR [line: 34] Unexpected (found "http:"(PNAME_NS)), production = "."
ERROR [line: 35] Unexpected (found "A"), production = ")"
ERROR [line: 35] undefined prefix "http"
ERROR [line: 35] With input '//www.cs.unc.edu/Research/jbsArchive/docs/AutoGeneratedSystems/ > Accessed 31 March, 2015.
Smith,': Invalid token "//www.cs.unc.edu/Research/jbsArchive/docs/AutoGeneratedSystems/" (found "//www.cs.unc.edu/Research/jbsArchive/docs/AutoGeneratedSystems/"), production = :predicateObjectList
ERROR [line: 36] With input 'Smith, John B. and Catherine F. Smith. \"ChicoryLane Farm.\" Website. < http://www.chicorylane.com': Invalid token "Smith," (found "Smith,"), production = :_turtleDoc_1
ERROR [line: 34] With input '"Smith, John B. \"Web-Based Systems and Instruction.\" Web. < http://www.cs.unc.edu/Research/jbsAr': Invalid token "\"Smith," (found "\"Smith,"), production = :RDFLiteral
ERROR [line: 34] Unexpected (found "http:"(PNAME_NS)), production = "."
ERROR [line: 35] Unexpected (found "A"), production = ")"
ERROR [line: 35] undefined prefix "http"
ERROR [line: 35] With input '//www.cs.unc.edu/Research/jbsArchive/docs/AutoGeneratedSystems/ > Accessed 31 March, 2015.
Smith,': Invalid token "//www.cs.unc.edu/Research/jbsArchive/docs/AutoGeneratedSystems/" (found "//www.cs.unc.edu/Research/jbsArchive/docs/AutoGeneratedSystems/"), production = :predicateObjectList
ERROR [line: 36] With input 'Smith, John B. and Catherine F. Smith. \"ChicoryLane Farm.\" Website. < http://www.chicorylane.com': Invalid token "Smith," (found "Smith,"), production = :_turtleDoc_1
Save file used 9.799644534
Appears it took exception to the http://www.cs.unc.edu/Research/jbsArchive/docs/AutoGeneratedSystems/ that ends up in the description.
@weiweishi what do we want the error message to say? I can include the id.
Maybe something like: "There was a problem with #{o['id']} and it was not included in the sitemap.xml"
Maybe we can have something like #{o['id']}: ERROR to be included in sitemap.xml, so the id can be parsed out more easily? Can it catch the "Invalid Token" error message? If we can have that information there, it would be great.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Tue, Jun 21, 2016 at 12:42 PM, pgwillia notifications@github.com wrote:
@weiweishi https://github.com/weiweishi what do we want the error message to say? I can include the id.
Maybe something like: "There was a problem with #{o['id']} and it was not included in the sitemap.xml"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/1195#issuecomment-227533373, or mute the thread https://github.com/notifications/unsubscribe/AB8-frL6jNywV4Tl4STfLU8I6ZFQN7pYks5qODChgaJpZM4I3s6F .
I'm using the Rails.logger.error so the ERROR would be redundant, I think. It'll appear in the production logs as something like
E, [2016-06-19T04:43:20.439294 #13017] ERROR -- : There was a problem with 9593tv13c and it was not included in the sitemap.xml -
Looks like the error I'm actually capturing is <ActiveFedora::ActiveFedoraError: Model mismatch. Expected GenericFile. Got: ActiveFedora::Base>
which doesn't have any of that information about the 'invalid token' that is printed to the screen.
Because it's handled in rdf-turtle-1.1.7/lib/rdf/turtle/reader.rb
# @option options [Boolean] :validate (false)
# whether to validate the parsed statements and values. If not validating,
# the parser will attempt to recover from errors.
...
rescue EBNF::LL1::Parser::Error, EBNF::LL1::Lexer::Error => e
if validate?
raise RDF::ReaderError.new(e.message, lineno: e.lineno, token: e.token)
else
$stderr.puts e.message
end
end
I don't think I can influence the validate? outcome. It would have to be given at
RDF::Reader.for(:ttl).new(StringIO.new(body), :base_uri => page_subject) do |reader|
[/usr/lib64/ruby/gems/2.1.0/gems/ldp-0.4.0/lib/ldp/response.rb:130] but it's not :(
Interesting. If we can capture the actual error message in the log it would be great. Otherwise, I'm fine with not having ERROR in the line, but it would be helpful to have ids up front. I will leave the rest of the language to you. This is going to help us a lot in going through existing objects, as it was not easy to detect them through audit.
Weiwei Shi
Digital Initiative Applications Librarian University of Alberta Libraries 2-10L Cameron Library Edmonton, Alberta, Canada T6G 2J8 Phone:(780)492-7802 Fax: (780)248-1209 Email: weiwei.shi@ualberta.ca
On Tue, Jun 21, 2016 at 2:10 PM, pgwillia notifications@github.com wrote:
Because it's handled in rdf-turtle-1.1.7/lib/rdf/turtle/reader.rb
@option options [Boolean] :validate (false)
whether to validate the parsed statements and values. If not validating,
the parser will attempt to recover from errors.
... rescue EBNF::LL1::Parser::Error, EBNF::LL1::Lexer::Error => e if validate? raise RDF::ReaderError.new(e.message, lineno: e.lineno, token: e.token) else $stderr.puts e.message end end
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ualbertalib/HydraNorth/issues/1195#issuecomment-227556833, or mute the thread https://github.com/notifications/unsubscribe/AB8-fivePtajbX94MRBWCMA58_-GGlQcks5qOEU8gaJpZM4I3s6F .
On era-test still fails after ~24 hours. This time seems to be related to file operations creating the sitemap.xml and fragments. Created #1249
add error-handling to sitemap generation to log broken items for manual repair