llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
25.95k stars 10.6k forks source link

Overeager setting of eof() #9707

Closed llvmbot closed 13 years ago

llvmbot commented 13 years ago
Bugzilla Link 9335
Resolution FIXED
Resolved on May 13, 2011 13:44
Version unspecified
OS All
Reporter LLVM Bugzilla Contributor

Extended Description

I believe the following code shows that libc++ is over-eager setting eofbit when reading chars. Certainly the behavior disagrees with libstdc++, and my very quick reading of the stream parts of the standard.

include

include

using namespace std;

int main(void) { istringstream ss("1"); char c; ss >> c; assert(!ss.eof()); }

llvmbot commented 13 years ago

There are two places I believe are having problems.

lib/test/uuid/test_io.cpp has:

{ uuid u;

    std::stringstream ss;
    ss << "00000000-0000-0000-0000-000000000000";
    ss >> u;
    BOOST_TEST_EQ(u, u1);

    ss << "12345678-90ab-cdef-1234-567890abcdef";
    ss >> u;
    BOOST_TEST_EQ(u, u3);
}

Which fails (I believe) because eof() is getting set after the first parse.

There are also problems in spirit. For example in libs/spirit/test/qi/match_manip2.cpp. The problem arises in match_manip.hpp (same directory):

{ std::istringstream istrm(toparse); istrm.unsetf(std::ios::skipws); istrm >> mm; return istrm.good() || istrm.eof(); }

The problem is that when we try to parse "abc" as 'char('a') >> char('b') >> char('z')', then the querying the last letter causes 'eof' to be activated. It was in spirit that I noticed the inconsistency, I'm not 100% sure this is causing it, trying to trace through spirit code is not for the faint of heart!

Looking again at 27.7.1.1 [istream] p3, I wonder about the code:

c = *i; if (++i == eof) __is.setstate(ios_base::eofbit);

If this is changed to:

c = *i; ++__i;

Then I get the same behaviour as g++.

I believe the implementation for char should just be

try { int_type c = this->rdbuf()->sbumpc(); if(c == eof) set eofbit; return c; } catch(...) { set failbit; }

The difference between char and everything else is that you have to look one character past the end of the int we are reading to make sure we have reached the end of it.

llvmbot commented 13 years ago

I've reread the appropriate parts of the standard. I'm just not seeing where char should be treated differently than any other type in this regard. All extraction is done using istreambuf_iterator, whether it is int, double, bool, or char. libc++ isn't going out of its way to treat char differently, or to set eofbit.

For every type, every time the istreambuf_iterator is incremented it is checked against an eof istreambuf_iterator and if true, eofbit is set. Here's the code for char:

template<class _CharT, class _Traits> basic_istream<_CharT, _Traits>& operator>>(basic_istream<_CharT, _Traits>& __is, _CharT& __c) {

ifndef _LIBCPP_NO_EXCEPTIONS

try
{

endif // _LIBCPP_NO_EXCEPTIONS

    typename basic_istream<_CharT, _Traits>::sentry __sen(__is);
    if (__sen)
    {
        typedef istreambuf_iterator<_CharT, _Traits> _I;
        _I __i(__is);
        _I __eof;
        if (__i != __eof)
        {
            __c = *__i;
            if (++__i == __eof)
                __is.setstate(ios_base::eofbit);
        }
        else
            __is.setstate(ios_base::eofbit | ios_base::failbit);
    }

ifndef _LIBCPP_NO_EXCEPTIONS

}
catch (...)
{
    __is.__set_badbit_and_consider_rethrow();
}

endif // _LIBCPP_NO_EXCEPTIONS

return __is;

}

Here's the code for long:

template <class _CharT, class _InputIterator> _InputIterator num_get<_CharT, _InputIterator>::do_get(iter_type b, iter_type __e, ios_base& iob, ios_base::iostate& err, long& v) const { // Stage 1 int base = this->get_base(iob); // Stage 2 char_type __atoms[26]; char_type thousands_sep; string grouping = this->stage2_int_prep(iob, atoms, thousands_sep); char a[num_get_base::__num_get_buf_sz] = {0}; char* a_end = a; unsigned g[num_get_base::num_get_buf_sz]; unsigned __g_end = g; unsigned dc = 0; for (; b != e; ++b) if (this->stage2_int_loop(b, base, a, a_end, dc, thousands_sep, grouping, g, g_end, atoms)) break; if (grouping.size() != 0 && __g_end-g < num_get_base::num_get_buf_sz) *g_end++ = dc; // Stage 3 v = __num_get_signed_integral(a, a_end, err, base); // Digit grouping checked __check_grouping(grouping, g, g_end, err); // EOF checked if (b == e) err |= ios_base::eofbit; return __b; }

With respect to setting eofbit, they are identical. Here is test code demonstrating this generic behavior:

include

include

template void test1() { std::istringstream ss("1"); T t; ss >> t; assert(ss.eof()); };

template void test2() { std::istringstream ss("1 "); T t; ss >> t; assert(!ss.eof()); };

int main() { test1(); test1(); test1(); test1(); test1();

test2<bool>();
test2<int>();
test2<double>();
test2<std::string>();
test2<char>();

}

Could you point me towards the boost code that is having problems?

Thanks.

llvmbot commented 13 years ago

Looking at libstdc++ (I'm not positive this is right), it uses 'gbumpc' to grab the character. If gbumpc returns a character then that's fine, else if it returns eof then eofbit is set.

The difference as I understand it is that libc++ proactively goes out of it's way to set eofbit, consider the following code:

void f1(void) { istringstream ss("123"); char c; for(int i = 0; i < 4; ++i) { char c; ss.get(c); cout << c << ":"; cout << ss.eof() << ","; } cout << "\n"; }

void f2(void) { istringstream ss("123"); char c; for(int i = 0; i < 4; ++i) { cout << ss.rdbuf()->sbumpc() << ":"; cout << ss.eof() << ","; } cout << "\n"; }

int main(void) { f1(); f2(); }

g++ prints:

1:0,2:0,3:0,3:1, 49:0,50:0,51:0,-1:0,

So, it doesn't set eof until it a call to sbumpc returns eof.

libc++ prints:

1:0,2:0,3:1,3:1, 49:0,50:0,51:0,-1:0,

So it outputs eof one step earlier.

Either way, I believe (but we should check) that all compilers agree with g++ (else they would have lots more failures in boost), so that's probably in practice what should be implemented, and fixed in the standard. Unless of course there is some workaround buried deep in boost I haven't found yet.

llvmbot commented 13 years ago

Doing only sgetc() isn't sufficient because then the next "ss >> c" would read the same character again. For every character extracted the "get pointer" must be incremented past that character. This is where sbumpc comes in.

Indeed, if this example went down to the streambuf level and called sgetc(), it would return '1' over and over and never set eofbit.

llvmbot commented 13 years ago

This seems to just be a problem with extracting a single char.

I don't see (but I am not a stream expert!) why it is necessary to have sgetc() ever return traits::eof(). Reading a single character from the stream will get the character, then operator>> should return? In this way char is different to int, etc., as it isn't necessary to look ahead to check you have reached the end of whatever you want to parse.

llvmbot commented 13 years ago

[istream]/p3 says in summarizing all of the istream extractors that if rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_- base::failure (27.5.4.3), before returning.

In your example the gptr() must be bumped by one after extracting '1', and that increment hits the eof. I believe not setting eofbit at this point would be in contradiction to [istream]/p3.

Is this behavior causing problems? If so, perhaps we should open an issue.