Open pavlis opened 5 years ago
Well, I actually don't know the answer to most of the points you listed here. Let me provide some thoughts.
The main reason for creating this new issue is this showed the problem we are going to face about interchanging C and python code. It came up instantly as the new TimeSeries and Seismogram classes would use an ObjectID in the private area. Hence, the C code will need to be compiled with at least include links to MongoDB libraries. I am not sure how much problem this will create. ObjectID itself is a simple thing that could easily be sent around as it is represented by a hex string.
I am concerned about linking mongodb libraries to the c++ code as that could potentially increase the complexity of the whole project. Currently I think we will only let the Python code interact with MongoDB. Our Python code might call the C++ code internally, but the ObjectID should only live within the scope of the Python code. I guess we need the ObjectID because of the ErrorLogger. To keep the design consistent, maybe it means that part of the functionality of ErrorLogger should be implemented in Python instead of C++. It is not clear to me how exactly we can do this, but I do think the general guideline is to limit the database related codes to Python only.
I'm kind of stuck on how to start. I think I need a playground where I can launch an instance of mongo and experiment. The UITS machines are problematic as a playground as they put things in weird places and I can't do hack fixes to avoid getting stuck. My mac is the other option, but macs debugging code on macs has been a long term problem I've had. I have an old machine I've been meaning to get running anyway that already has Ubuntu installed. I think I'll try to use it as my playground. Any suggests for an alternative?
I think you should be able to achieve all these in a Docker container. I got the container setup to run everything, and you should be able to debug the code there.
What are the things we want to pass back and forth between C code and python? As always we want numeric intensive algorithms to be in C/C++ to make them run faster.
This is really the core of the problem here. I think anything that relates to the database should be in python. I need to get some more done in Python before I can answer this question. I think the model will be similar to what we gonna do with obspy - extend from the existing objects.
What converters do we need in the short term to maximize speed of implementation without putting us down a dark alley? Seems we need a seamless way to convert obspy python objects into something my C code can digest and vice versa. I think I know ways to do that, but what should be done in python and what in C is bewildering.
Maybe that means we should design the python interface carefully so that it provides a uniform interface to the users with underlying obspy and our C++ library interchangeable. This shouldn't be difficult in Python once we have a clean C++ interface.
I think we discussed some of last week too, and makes sense that we keep all db interactions in python. That is a helpful starting principle that helps reduce the confusion. A couple followups though:
So my next step to clean this is up, I think, is to be more concrete in how we build this api in C++. Give me a few days
I am not exactly sure why is there a internal transfer needed between Python and C++ code. I was thinking of making something that works like this:
It seems to me that we don't actually need to make the connection between C++ and ObsPy (marked by dash line) work. The Python component should be able to hide the complexity underneath it. Also, the Python component here should be more than just a bunch of database CRUD routines. It should have its own definition of TimeSeries and 3C Seismogram that extend from the ones in C++ Python bindings. It should also be able to utilize the routines available in ObsPy through wrappers. Then, the 3C data there should follow your design rather than ObsPy's. I do think your design is better in handling the data in a workflow. Also, once it is in Python, we can use pickle for serialization easily. In this design, the C++ component is more like a seispp library 2.0 - it could be a standalone library for C++ and the Python code extends its ability through added features like pickle and MongoDB. In the meantime, we can still implement serialization with boost.serialization, but that won't be used by Python.
MsPASS Core I/O API
1. Design assumptions:
2.1. C-reate The obvious core tool for this is collection col; col.insert_one( …) I think a comparable tool for ensemble is insert_many if we accept assumption 4, but suspect it will be easier to do by a loop over ensemble members saving each member with a variant of insert_one().
I’ll initially define a set of procedures that might form the API. These are cast here as methods that assume “this” has entities (e.g. a collection object) that define where the data would be written. If we make it purely procedural each function below would need an argument defining the db handle (probably a collection object as I read it).
I know this isn’t proper python syntax, but I hope it gets the idea across:
// saves a obspy trace object. Some attributes are required in obspy, some are optional.
// optional list should name things that are not core that should be save. It will need a default.
def dbsave(Trace t, Stats s, optional list of keys of values to save)
// save a TimeSeries object. The savelist thing contains a list of attributes to be written to the db. // This will require some similar features to the obspy version above, but with some variations
def dbsave(TimeSeries d, MetadataList savelist)
// similar for 3C data
def dbsave(Seismogram d, MetadataList savelist)
// Ensembles are just collections of one of the above. The simplest approach is to have the
// the writer just be a loop with calls to dbsave for the member type. May not be the fastest
// but remember the axiom: make it work before you make it fast.
// Ensemble metadata should always be saved and the algorithm should make sure ensemble
// values override any values in the members if they are present in both places.
def dbsave(TimeSeriesEnsemble d, MetadataList savelist_ensemble, MetadataList savelist_members)
def dbsave(ThreeComponentEnsemble d, MetadataList savelist_ensemble, MetadataList savelist_members)
Now I think the algorithm for all of these is roughly the same, but the implementation for obspy is rather different. Let me start with a generic algorithm for the C++ objects that use metadata.
1) Foreach s in MetadataList a. Switch on type b. Call proper get method for type (handle errors gracefully) c. Push result to a python dict 2) Call a generic function to write a dict to mongo 3) Call a function to write the data array of sample values with fwrite or a gridfs
For obspy:
1) Write the header dict data 2) Call the same function as above to write the data array of sample values
Obspy doesn’t have ensembles per se, but their Stream object is nearly identical in concept to a TimeSeriesEnsemble. It just doesn’t have a group header. A writer for one of these would be just a loop over the list of Trace objects calling the above function.
For the C++ code the ensemble metadata creates a complication that could be very useful, but which we will have to document carefully if we go this route. I would recommend the ensemble metadata be ALWAYS definitive over any member data AND be required to be stored in each member’s document. That will require writing a (quite trival) C++ function with this prototype:
template class
with a python wrapper. I think it might be possible to implement with the += operator for Metadata which we can wrap. This thing should not need an exception handler as the algorithm is a loop over members calling put methods for each ensemble metadata attribute to each member. Once that function is called the writer for any ensemble is just a loop over the members just like the algorithm for an obspy Stream.
2.2. R-Read This is the hardest problem. I don’t see a way to do this consistenly in obspy and the C++ library. I’ll start with the C++ library since that is the one I’ve been thinking about most and know best.
I think the basic loader could have this signature:
// return type T – I think in python requires coding each T supported.
def dbread_T(ObjectID oid, (optional)MetadataList mdl)
I think the python script would call find to locate the right document. Then it would need a loop to go through mdl fetching each key:value pair in the list and posting it to a dict. So, one base procedure/method would be where T was a python dict. Seismic data readers would then take a form like the pseudocode: 1) Header=dbread_Metadata(oid,mdl); 2) D=dbread_array(oid); // This would load a float or double array D with values 3) Build T from Header and D (different for different class types). 4) Return T
I think the only variation for obspy is that the mdl would have more importance and have some frozen keys. More on that below.
2.3. U-Update Updates are a variant of the create algorithm. The primary difference for C++ objects is the algorithm needs to fetch the ObjectID and call a find method before calling the update method. However, a feature I have in this version of Metadata makes the algorithm slightly more complicated but likely can make a significant difference in performance. That is, it has a private list with keys to the values that were changed since construction. (Note: I just noticed a hole in that api – there should be a method to retrieve that list of keys since it is not public.) The algorithm then should just take that list of keys and update only those.
For obspy I think the only thing we can do is call the update method instead of the one that creates a new document. We will need to put the ObjectID in the stats dict.
2.4. D-Delete This is simple provided we assure all objects, including the obspy Stats dict include ObjectID from which each data member was constructed.
3.0 Defining common attribute keys Both the python dict and Metadata are completely agnostic about what the key for any attribute should be – they use generic containers. In many ways the “schema” for MongoDB is little more than specification of the keys and value types for those keys. This gets to something that can get nauseating if handled by a committee that is commonly called an ontology. We only need part of that topic, which is the namespace, but noteworthy for speaking with some audiences.
Some points we need to consider in this design:
I think our first step, as stated in your original proposal, is to define the required attributes for waveform data. These should give us the experience we need to go further.
Ian, I looked at your tests on the experimental branch for python. I have also spent quite a few hours trying to crack the boost documentation on installing and testing boost python. I am very frustrated by the documentation for boost python. I am pretty certain it has not been updated to reflect some major changes in the software. I'm hoping you can comment on the following points to confirm or deny I have this right and address questions:
No matter how we do the build I am definitely going to write the wrapper code manually for the C++ interface. We do not necessary want to expose all C++ class methods and functions to python. Some are better just used as C++ internals.
Actually, I didn't even start with the boost documentation, so I never tried the bjam related stuff. Instead, I think I found an example of using cmake with boost.python somewhere, and that's how I got the code compile. I guess you probably should just use my cmake setup as the template. It is pretty straightforward anyway.
Ok, I'll use the cmake configuration you created. Note the boost version is anything but straightforward as bjam/b2 is yet another build system.
Working through this. Realized I had to do some changes to the cmake configuration files. Learned some useful things there, but I'm stuck on a configuration question I have no easy way answer. I think you can do so quickly though.
The issue I'd like to know is if the version of boost you installed has a boost python library of some kind created by the install process for boost python. The issue is what is supposed to be done by this set of line in your cmake configuration file in the cxx directory for the python branch: FIND_PACKAGE(Boost COMPONENTS python) if (NOT Boost_PYTHON_FOUND) FIND_PACKAGE(Boost REQUIRED COMPONENTS python${PYTHON_VERSION_MAJOR}${PYTHON_VERSION_MINOR})
Is there a library this is supposed to set that defines this line message(STATUS "Boost_LIBRARIES = ${Boost_LIBRARIES}")
Or is that supposed to be set in the probe for boost? Anyway, if you can tell me what Boost_LIBRARIES resolves to when you get this to work it would be helpful.
So, there is a difference between the FIND_PACKAGE(Boost COMPONENTS python)
and the FIND_PACKAGE(Boost)
line earlier in the CMakeFileList.txt. The latter will only look for the standard boost headers (which only defines the Boost_INCLUDE_DIRS
), and the former will find the boost.python library component in addition to the header, and it will define the Boost_LIBRARIES
variable (in the docker image which has Ubuntu Bionic, it is defined as Boost_LIBRARIES = /usr/lib/x86_64-linux-gnu/libboost_python3.so
). Note that depending on the version of Boost library and the version of Python, the library may have different names (e.g. libboost_python.so, libboost_python2.7.so, libboost_python3.6.so, etc). That's why I have the if branches, which is trying to capture all different combination.
The lines here makes the former FIND_PACKAGE(Boost)
line redundant, and the setup of finding boost.python right now will fail instead of installing our own version when not found. I do think I should revise these behavior, but I plan to do it later when all python stuff is more solid.
Since you are working on an Ubuntu system, I think you should probably look at the Dockerfile, which has all the packages and setups based on Ubuntu. You should be able to get everything working following exactly the same recipe.
Picking this thread up here after initial testing with boost python, pybind11, and cython. From that work (discussed in "boost python setup #19 ") I have these important conclusions for implementing MsPASS:
Only in the last step would a C/C++ wrapper with pybind11 be needed.
A residual issue is that I am still struggling with trying to refine the line between C/C++ code and cython. A nasty issue with my seismic code is that the C++ code uses multiple inheritance. cython only recently seems to have added support for multiple inheritance. It is a hard problem because python doesn't support multiple inheritance. There doesn't seem to be a ton of information on the web on this topic and the standard documentation doesn't address it at all. I have been unable to make a wrapper for a simple test class even work, but I still may have something wrong in this implementation that is almost as complicated as pybind11. This morning I had a different idea I am going to pursue that might make this a lot simpler. That is, with cython I realized I didn't need to always actually wrap the C++ classes. A better model may be to write C++ procedures that call C++ library routines and use cython to write the wrappers for the function - much much easier than a complicated class like TimeSeries that uses a lot of advanced C++ features.
I'm going to start by trying to write an obspy Trace to TimeSeries object that can be compiled with cython. There will be some new challenges there, but would yield a useful core funtion anyway.
Continued work with cython was not encouraging for using it as the vehicle for building our C++ to python api. I wrote a simple toy problem with multiple inheritance and ran into a long string of problems. Like pybind11 it was easy to build an interface for simple procedures and simple C++ classes. I did not crack the inheritance problem and kept hitting some odd issues that revealed some of the warts in cython.
In a fit of frustration I returned to pybind11 and tried the simple fix you suggested of changing
m=mspasspy.Metadata
to
m=mspasspy.Metadata()
Happily this worked perfectly. I could put and get double, int and string variables without any issues. I did, however, discover a mismatch between the way the two languages define a boolean variable that is going to take some interfacing to clean up. Fortunately, booleans are not something of top priority as not many data formats explicitly include them but most (e.g. SEGY) use integers 0 to define false and anything else true.
So that experience is changing my perspective. I suggest we may need to deal with both cython and pybind11. Here is what I would now advise:
So, I'm going to now move to flesh out the rest of the wrappers for TimeSeries and Seismogram objects using pybind11. Wish me luck
A brief update on this issue. I've made a lot of progress on building wrappers for the C++ code to python. All the metadata related stuff has passed my current test program. I have CoreTimeSeries working as well. It took me a bit to understand an excessively terse description in the documentation on the proper way to map an std::vector container. Turned out you have to call a macro to bind a specific version like vector
I have only two residual hurdles to get over to finish the core code bindings:
Overall this has been a lesson in something that really require immersion into the documentation to be possible. The pybind11 documentation is terse and you have to know both python and C++ (especially) to comprehend what they are saying. They leave way way too many holes that can only be filled by looking at their examples. Problem with many of their examples is they are full of complicated template constructs that I find really hard to take apart and comprehend.
Anyway, I guess the key point for the record is I'm going to be the one to need to maintain these wrappers for as long as I'm around and capable. The wrapper code itself has tricks that make it a specialization that you don't want to learn other than adapting what we'll have here.
As you noticed I checked in new branch with the pybind11 wrappers and it fails to compile on all the pieces travis runs. I think I know the reason, but don't know enough about cmake to fix it. The top level Cmake has this section to configure pybind11
if (PYTHONINTERP_FOUND) if (PYTHON_VERSION_MAJOR EQUAL 3) find_package(pybind11 REQUIRED) if(NOT pybind11_FOUND) message("pybind11 not found") endif() endif() else() message("Python not found") endif()
That is failing and the key line, I think, is this one: -- Found PythonInterp: /opt/pyenv/shims/python (found version "2.7.15")
The logic of the above must be wrong as was supposed to exit if not python3. pybind11 only works with python3. It works on my ubuntu system because it has 3 set as the default.
The other thing that is your call is what to do with the installation of pybind11. I don't know if the find_package line should download and install pybind11, but that is what is failing. On my system it works because I installed pybind11 in the default location of /usr/local
I am hacking to get this to work on IU's carbonate system. For the record:
I then got this error from cmake which is fairly informative: CMake Error at CMakeLists.txt:66 (find_package): By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "pybind11", but CMake did not find one.
Could not find a package configuration file provided by "pybind11" with any of the following names:
pybind11Config.cmake pybind11-config.cmake
Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
"pybind11_DIR" to a directory containing one of the above files.
"pybind11" provides a separate development package or SDK, be sure it has
been installed.
-- Configuring incomplete, errors occurred! See also "/N/u/pavlis/Carbonate/src/mspass/cxx/build/CMakeFiles/CMakeOutput.log". See also "/N/u/pavlis/Carbonate/src/mspass/cxx/build/CMakeFiles/CMakeError.log". [pavlis@i8 build]$
Not sure which of suggestions to follow. If pybind11 has a "development package or SDK" that would be a good solution. I am assuming this worked for me on my ubuntu machine because I was able to install pybind11 in the stock location of /usr/local
This is your decision, Ian, but I think the experimental branch implementing pybind11 is ready to be integrated into the master branch for this repository. pybind11 makes boost python obsolete and I think we now know the role of cython is to allow users to speed up python scripts they develop by compiling them. Remains to see if pybind11 wrapped C++ code will play well with cython, but the chatter on the pybind11 github site and the existence of test code with cython suggests those developers and on top of that issue and we should let them worry about that.
Before you merge the experimental branch, however, I would suggest you test that you can at least build the experimental branch on your mac. Would suggest you also run the simplified test I created this morning. I have a larger one that tested the seismogram and timeseries wrappers on the inaccessible machine at home. Suggest we can use the simplified test below as the core of a formal test later or you can build on this one and create your own.
This script tests only the new MetadataDefinitions implementation: import mspasspy as msp mdf=msp.MDDefFormat.PF m=msp.MetadataDefinitions('test.pf',mdf) s=m.concept('dt') print(s) a=m.keys() print(a) b=m.aliases('t0') print(b)
You will need this file in the same directory you run the above script. Copy and paste the following to 'test.pf': dt &Arr{ type real concept &Tbl{ Sample interval in seconds. } } nsamp &Arr{ type int concept &Tbl{ Number of samples in a waveform segment. } } sta &Arr{ type string concept &Tbl{ Seismic station name code. } } t0 &Arr{ type double concept &Tbl{ Start time of a signal. } }
aliases &Tbl{ dt delta t0 starttime sta station }
Also note I have already done the clerical work of creating a pf file like this for the full obspy Trace object stats content, but again that is stranded on my machine at home that is currently off and behind a home router.
This example raises an implementation issue: it would be more rational to store the MetadataDefinitions data as json/bson. Probably should have a json based constructor in C++ to start and if we can figure out how to pass a MongoDB handle to C++ we could have a db constructor. Anyway THE KEY point is we definitely ought to support json. I think you have some standard libraries for this in C++ or know how to get them. I found some things in some old code of yours once, but don't think I have that on this laptop. What do you suggest for this?
I think you are right. Our proposed workflow for pybind11 and cython should be fine for now.
I agree that the branch is ready to be merged. You may not have realized that I already fixed the docker and cmake build files while you working on fixing other issues. You can see that the whole package builds fine in Travis (example). The build shows as failed soly because of the metadata tested got seg fault in it. I need to take a look at it to see what's going on, but I think it should be something minor.
I will test out the python script and figure out how to integrate that into the CTest we have right now.
I agree that we need to at least support json as the configure format. I think I once wrote a version of json style metadata in your seispp library. I will see if I can still find it. It should be easy to adopt that to current framework.
Send me your code when you find it or, better yet, modify it to mesh with bson and MongoDB. In particular, I think bson may support some types not supported by stock json, but I am not sure of that.
I see you are actively messing with this. I may have broken Metadata as I tweeked a couple things in building the wrappers. I'll look at the history logs to see if I can get a hint why it is seg faulting. Could be an api change that somehow didn't get propagated through.
I already figured it out. It is not really a seg fault. It is just testing the error logs, which throws some errors deliberately. The real issue is that this test is not written to comply with CTest. CTest determines whether a test is successful or not based on the exit code of a command. Anyway, not something to really worry about for now.
Finally found the code here. I can't believe this was done 5 years ago. Let me see how to make it work with bson.
I was reading through the pybind11 code, and try to learn it so that I can at least understand how to change it if something comes up down the road. I realized that the get
method of Metadata
is not bound yet, and I wonder if that is due to the use of boost::any
. I came across this piece of code that seems to serve as a good example on how to make pybind11 work with boost::any
. What do you think?
That code looks very interesting. I was planning a model where the MetadataDefinitions would be used to enforce type attached to any valid metadata. This could be more flexible, but I can't guess the relative cost in computational effort.
The reason I hadn't done the wrappers for the templated get is that I didn't see a way to do it. I think it would be possible to do something like define a python only method using the get template. For example, I think we could define get_float that was call call to get
Now I understand the issue much better. For documentation purpose: the get template cannot be overloaded through pybind11 because the return type check happens at runtime within the implementation of the method by ourselves, so there is no way to pybind11 to tell the correct type before the method is called.
I think we definitely don't want to impose that kind of limit of data types to python programmers as that will be very confusing for a duck typing language. Potential solutions are implementing either a wrapper of get method in python around all the get_* method or another instance of get method with the return type as boost::any
and use pybind11's custom type caster to convert that type. The former will be easier to implement, and the latter will be cleaner to build and maintain. Either will need some efforts to learn how to make it work with what we have here.
Not sure I agree with that, but my view is probably colored by a career working with languages where type was always a concern and hence just something to automatically consider. MongoDB will enforce type as all data are stored with an explicit type information on way or the other. I think we invite chaos if attributes can be stored without any way of enforcing the type associated with a specific key. That is at least true for any common attributes we want to define as form of a schema. So, I'm not sure we aren't inviting future problems by being too loose about type for attributes.
Well, probably I expressed this in a wrong way. Actually, I totally agree that we need to have a schema for the database. The point above is really only for the get
method of Metadata
, which is functional in C++ but not in Python. I was just exploring on resolving that issue - how to translate the returned template type from C++ to Python. This will not bring chaos as the type that a user can get
is already restricted by the keyword being used. This template is really serving the purpose of simplifying the code.
If we want to extend the capabilities for types beyond the current double, int, string, and bool pybind11 does provide a way to use templated functions with a unique name. e.g. if we want to provide support for something like get
That example, brings up a complexity here. There are at least two different type issues we should think about. First, are simple size issues. e.g. in terms of concept float, double, double double, real32, real64, and real128 (using C definitions) are the same thing. Same is true for int, long int, long, short, short int, int16, int32, and int64. bool is bool in any language as a concept but how a bool is defined in a computer word structure is language dependent. Strings are similar. FORTRAN and C treat them differently and C++ defines a string object which is more or less but not exactly the same as a char *. The point is, get of these low level entities out to be seamless. I think we can take advantage of python there and to this at the db level in mongo. i.e. we might write a load_real, load_integer, load_boolean, and load_string in python that does all the checking against expected types, keys, and size specifications and hides that from the user. An approximate example prototype would have this signature: def load_real(dbhandle, key, mdef) where dbhandle is some object the interact with mongo, key is the string key for the fetch, and mdef is a MetadataDefinitions object. load_real would be loose about size and simply return a standard double.
For this kind of attribute I think we would need a similar save_real function that would similarly handle conversions. It would have to be a bit cautious about things like int trunction, but that is a detail.
The second kind of attribute we need to think about are we other "types" (aka objects) that MongoDB supports directly. For these I think we would want to b rigid about type in and out. It has been a couple months since I reviewed the list of supported types but it is fairly extensive. Anything not on that list would require serialization and a different kind of wrapper. These are likely best done as pure python entities anyway as these do not map directly to the C interface anyway.
That got MUCH longer than I had originally thought, but these ideas came to me as I was writing this response. In summary:
I thought pybind11's support for template is limited to functions of template type as argument instead of return type. That's why we currently don't have the T Metadata::get(string key)
implemented in the Python interface. I think a wrapper in C++ should be able to resolve this issue, and I was thinking of using the custom type caster to implement that. Since you know pybind11 much better, I guess you probably knows some better way of doing that.
I agree the type support is gonna be something we should carefully design, since C++, Python and MongoDB all have different types supported. Since the core here is the interface with the database, I guess we should make our Python and C++ APIs adopt what MongoDB has. I think the low level types that MongoDB defines are 32-bit integer, 64-bit integer, 64-bit floating point, boolean, and UTF-8 string. The boolean and string have corresponding ones in both Python and C++. The numbers is where the complexity comes in. For the Python API, this is still straightforward. (Note that there is a subtle difference between Python2 and Python3, but we only need to support Python3 in this project.)
In Python, the int is unlimited in length, and the float is always 64-bit, so we can easily map them to the 64-bit integer and 64-bit floating point in MongoDB. We can optionally implement a 32-bit integer in Python, but I feel that is not really necessary.
In C++, we have some more types to deal with. For the floating point, it is still straightforward, since any 32-bit float
can automatically get converted to 64-bit double
. We only need to make sure anything that is loaded from the database will not get truncated accidentally. The integer types are messy since the length is always dependent on the platform and the C++ standard only defines the lower limit of a type. However, we can pretty much assume that our code will only be running on a platform that supports 64-bit. Therefore, to ensure that the integers read from the database won't get truncated, we should use long int
for 32-bit integer and long long int
for 64-bit integer. It is actually OK to use int
and long int
instead on a 64-bit Unix system according to here, but that could get us into trouble in some edge cases. It is also worth noting that there is no unsigned integer type in neither Python nor MongoDB, so any use of that in C++ need to be handled carefully.
After reading what I wrote this morning and your response a few hours ago, I am wondering if we are worried about a problem that may not actually exist except on import and export. Generally size issus only arise with data read from a foreign source and some standardized format (e.g. SEGY is full of obnoxious, archaic 16 bit ints). I think your summary about internal usage is pretty solid and we should probably stick to it: (1) all ints are 64bit, (2) all reals are double (real64), and (3) strings are UTF-8. The last might be more limiting than necessary as there are lots of seamless wchar implementations these days. You would know better than me for sure, since I presume that is the standard for chinese characters.
I agree, let's just stick to 64-bit int and double. I can definitely look up the wchar stuff, but don't really have to consider that for now I guess.
Have been mucking around with some things on a separate git branch. A few things emerged that I could use another head to help me sort out.
First a couple successes:
Problems that surfaced:
I'm kind of stuck on how to start. I think I need a playground where I can launch an instance of mongo and experiment. The UITS machines are problematic as a playground as they put things in weird places and I can't do hack fixes to avoid getting stuck. My mac is the other option, but macs debugging code on macs has been a long term problem I've had. I have an old machine I've been meaning to get running anyway that already has Ubuntu installed. I think I'll try to use it as my playground. Any suggests for an alternative?
Other points I think we should address in this thread are:
There will be others, but I'm closing this as it is already too long.