heal-research / HeuristicLab

HeuristicLab - An environment for heuristic and evolutionary optimization
https://dev.heuristiclab.com
GNU General Public License v3.0
34 stars 16 forks source link

Improve regex in String2XMLSerializer #2382

Open HeuristicLab-Trac-Bot opened 9 years ago

HeuristicLab-Trac-Bot commented 9 years ago

Issue migrated from trac ticket # 2382

milestone: HeuristicLab 3.3.x Backlog | component: General | priority: medium

2015-05-04 16:46:26: @Shabbafru created the issue


I ran into the limitation that arrays are not allowed to be larger than 2GB in total size. I propose to enable support for larger arrays in .NET. This is a new option that came with .NET 4.5 and can be enabled by putting

into the app.config. See https://msdn.microsoft.com/en-us/library/hh285054%28v=vs.110%29.aspx for more details.

HeuristicLab-Trac-Bot commented 9 years ago

2015-05-04 16:46:39: @Shabbafru changed status from new to assigned

HeuristicLab-Trac-Bot commented 9 years ago

2015-05-04 16:46:39: @Shabbafru changed owner from @Shabbafru to @s-wagner

HeuristicLab-Trac-Bot commented 9 years ago

2015-05-05 09:43:43: @Shabbafru commented


Example of such an exception:

HeuristicLab.Clients.Hive.SlaveCore.TaskFailedException: Task failed with reason: HeuristicLab.Persistence.Core.PersistenceException: Unexpected exception while trying to parse object of type "System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089". ---> System.OutOfMemoryException: Array dimensions exceeded supported range.
   at System.Text.RegularExpressions.RegexRunner.DoubleTrack()
   at System.Text.RegularExpressions.RegexInterpreter.Goto(Int32 newpos)
   at System.Text.RegularExpressions.RegexInterpreter.Go()
   at System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text, Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean quick, TimeSpan timeout)
   at System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen, String input, Int32 beginning, Int32 length, Int32 startat)
   at System.Text.RegularExpressions.MatchCollection.GetMatch(Int32 i)
   at System.Text.RegularExpressions.MatchEnumerator.MoveNext()
   at HeuristicLab.Persistence.Default.Xml.Primitive.String2XmlSerializer.Parse(XmlString x)
   at HeuristicLab.Persistence.Interfaces.PrimitiveSerializerBase`2.HeuristicLab.Persistence.Interfaces.IPrimitiveSerializer.Parse(ISerialData data)
   at HeuristicLab.Persistence.Core.Deserializer.PrimitiveHandler(PrimitiveToken token)
   --- End of inner exception stack trace ---
   at HeuristicLab.Persistence.Core.Deserializer.PrimitiveHandler(PrimitiveToken token)
   at HeuristicLab.Persistence.Core.Deserializer.Deserialize(IEnumerable`1 tokens)
   at HeuristicLab.Persistence.Default.Xml.XmlParser.Deserialize(Stream stream)
   at HeuristicLab.Persistence.Default.Xml.XmlParser.Deserialize[T](Stream stream)
   at HeuristicLab.Clients.Hive.PersistenceUtil.Deserialize[T](Byte[] sjob)
   at HeuristicLab.Clients.Hive.SlaveCore.Executor.Start(Byte[] serializedJob)
HeuristicLab-Trac-Bot commented 9 years ago

2015-05-11 15:32:54: @Shabbafru changed owner from @s-wagner to @epitzer

HeuristicLab-Trac-Bot commented 9 years ago

2015-05-11 15:32:54: @Shabbafru changed title from Add support for large arrays to Improve regex in String2XMLSerializer

HeuristicLab-Trac-Bot commented 9 years ago

2015-05-11 15:32:54: @Shabbafru commented


There is a regex in String2XMLSerialilzer that maybe could be improved to reduce memory consumption:

private static readonly Regex re = new Regex(@"<![CDATA[((?:[^]]|](?!]>)))]]>|([^<])", RegexOptions.Singleline);

Could you have a look at it? It may be a problem with backtracking?

HeuristicLab-Trac-Bot commented 9 years ago

2015-07-07 00:20:05: @Shabbafru changed milestone from HeuristicLab 3.3.12 to HeuristicLab 3.3.x Backlog

HeuristicLab-Trac-Bot commented 9 years ago

2015-07-07 00:20:05: @Shabbafru commented


I discussed this with epitzer. The regex seems to be ok so the reason for this problem is the big hive job. As the gcAllowVeryLargeObjects flag is not an option there is no quick way to fix this at the moment.